Pre-EXAM
--------------------------------------------------------------------------------------------------------
1. You need to find out which genes are differentially expressed in cancer in
mouse. You want study differential expression at gene level (as opposed to
transcript level) and you want to focus on known genes (as opposed to finding
novel ones). You have a limited budget for sequencing, so you have to select one
of the following options:
a) obtain 120 million reads (75 bp, single end) from one control sample and one
cancer sample
b) obtain 30 million reads (75 bp, single end) from four control samples and four
cancer samples
c) obtain 90 million reads (150 bp, single end) from one control sample and one
cancer sample
Which option would you choose and why?
Answer:
b) because you need biological replicates to estimate the within-group
variance.
2- If your focus was discovering novel transcripts rather than detecting
differential expression, what kind of reads would you order (single end/ paired
end, shorter/longer) and why?
Answer:
Paired end and longer, so that it is easier to differentiate between transcripts
3- RNAseq data analysis involves several steps and quality control can be
performed at different stages of analysis. List what kind of quality aspects can
be measured. As a bonus list names of software for each of these steps.
Answer:
-raw reads (fastq): base quality, base composition. FastQC.
-aligned reads: mapping quality, saturation, if reads map to genes or
intergenic regions, uniform transcript coverage, etc. RseQC.
-count table: MDS plot to see if groups separate and if there are confounding
factors. edgeR.
--------------------------------------------------------------------------------------------------------
Practice exam NGS 1 (2p)
A common problem when performing de-novo assembly of next generation
sequencing reads is that the same sequence pattern can occur more than once in
the target sequence. This causes the assembly to break into a set of “contigs”.
, In the illustrated case below, the same pattern symbolized by a yellow arrow
occurs three times in the sequenced genome:
Which of the following strategies could reasonably be expected to reduce this
problem? (2p)
YES NO
I) Using paired-end reads
II) Increasing the read length
III) Quality trimming of the read data
IV) “Barcoding” the DNA with index sequences
Check “yes” or “no” boxes for +0.5p per correct answer. -0.5p for incorrect
answers, leave both blank for 0p. Total sum will not be counted as negative.
Practice exam NGS 2 (4p)
A researcher is mapping her NGS reads to a reference sequence to find bases that
differ between the reference and the sequenced sample (called single nucleotide
polymorphisms, “SNPs”). One such SNP is reported at the position shown in the
figure.
The researcher is viewing the mapping output using the Tablet software with
settings enabled to show the read direction as green / blue and bases differing
from the reference sequence as red.
--------------------------------------------------------------------------------------------------------
1. You need to find out which genes are differentially expressed in cancer in
mouse. You want study differential expression at gene level (as opposed to
transcript level) and you want to focus on known genes (as opposed to finding
novel ones). You have a limited budget for sequencing, so you have to select one
of the following options:
a) obtain 120 million reads (75 bp, single end) from one control sample and one
cancer sample
b) obtain 30 million reads (75 bp, single end) from four control samples and four
cancer samples
c) obtain 90 million reads (150 bp, single end) from one control sample and one
cancer sample
Which option would you choose and why?
Answer:
b) because you need biological replicates to estimate the within-group
variance.
2- If your focus was discovering novel transcripts rather than detecting
differential expression, what kind of reads would you order (single end/ paired
end, shorter/longer) and why?
Answer:
Paired end and longer, so that it is easier to differentiate between transcripts
3- RNAseq data analysis involves several steps and quality control can be
performed at different stages of analysis. List what kind of quality aspects can
be measured. As a bonus list names of software for each of these steps.
Answer:
-raw reads (fastq): base quality, base composition. FastQC.
-aligned reads: mapping quality, saturation, if reads map to genes or
intergenic regions, uniform transcript coverage, etc. RseQC.
-count table: MDS plot to see if groups separate and if there are confounding
factors. edgeR.
--------------------------------------------------------------------------------------------------------
Practice exam NGS 1 (2p)
A common problem when performing de-novo assembly of next generation
sequencing reads is that the same sequence pattern can occur more than once in
the target sequence. This causes the assembly to break into a set of “contigs”.
, In the illustrated case below, the same pattern symbolized by a yellow arrow
occurs three times in the sequenced genome:
Which of the following strategies could reasonably be expected to reduce this
problem? (2p)
YES NO
I) Using paired-end reads
II) Increasing the read length
III) Quality trimming of the read data
IV) “Barcoding” the DNA with index sequences
Check “yes” or “no” boxes for +0.5p per correct answer. -0.5p for incorrect
answers, leave both blank for 0p. Total sum will not be counted as negative.
Practice exam NGS 2 (4p)
A researcher is mapping her NGS reads to a reference sequence to find bases that
differ between the reference and the sequenced sample (called single nucleotide
polymorphisms, “SNPs”). One such SNP is reported at the position shown in the
figure.
The researcher is viewing the mapping output using the Tablet software with
settings enabled to show the read direction as green / blue and bases differing
from the reference sequence as red.