WITH COMPLETE SOLUTIONS GRADED A++ LATEST
UPDATE
Why sequence a genome
- Understand the function and evolution of an organism (evolution of genome and
evolution of relationship with different species)
- Find out how genes work together to direct growth, development, and maintenance of
an organism
- Find correlations between how genome information relates to development of cancer,
susceptibility to certain diseases and drug metabolism (pharmacogenomics.) Enables
precise medicine.
- To study gene expression in a specific tissue, organ or tumour
- Understand how gene expression is regulated in a particular environment
de novo sequencing
sequencing a novel genome where there is no reference sequence available for
alignment. Construct genome from overlaps between reads.
resequencing
Genome has been sequenced before, can compare with a reference. Identify variants
first generation sequencing
,Sanger sequencing
Production: rooms of equipment, sample preparations, 35 people, 3-4 weeks
Sequencing: 74x capillary sequencers, 10 people, 15-40 runs per day, 1-2Mb per
instrument per day, 120Mb total capacity per day
Sanger sequencing
- Sanger sequencing relies on the incorporation of ddNTPs. These ddNTPs lack a 3'-OH
group, preventing the addition of further nucleotides, thus terminating the DNA strand.
- The DNA to be sequenced is first fragmented and then cloned into a vector or
amplified by PCR.
- A short primer complementary to the vector or a known sequence is annealed to the
single-stranded DNA template.
- DNA polymerase extends the primer, incorporating normal deoxynucleotides (dNTPs)
and occasionally incorporating a ddNTP, which terminates the extension. This results in
a mixture of DNA fragments of varying lengths.
- Fragments are separated by size using gel electrophoresis/ capillary electrophoresis
now.
- The terminal ddNTPs are labeled with fluorescent dyes, allowing the fragments to be
detected by a laser as they pass through the gel or capillary. The sequence is
determined by analyzing the order of the fluorescent signals.
- Run them in four lanes. Four nucleotides, each labelled differently, run them side by
side. Read each nucleotide one by one.
Fragmentation of a genome
,Genome is incredibly long, needs to be broken down. Cut into manageable pieces
randomly, sequence them, then put them back together. Big genomes, but short read
sequencing technology.
Assemble the fragment together to reconstruct the original DNA, scaffold gaps
sequencing overlaps
crucial to arrange the sequence back into what its found in the genome
pros and cons of Sanger
Pros: lowest error rate, long read length (around 750bp), targets primer
Cons: high cost per base, long time to generate data, need for cloning, lots of data per
run
shotgun genome sequencing
A method used to sequence long DNA strands by breaking them into random small
fragments, sequencing these fragments individually, and then reassembling the
sequences to reconstruct the original genome.
- The entire genome is randomly broken into many small overlapping fragments.
- These fragments are cloned into vectors or prepared for sequencing by attaching
adapters. Two kinds of cloning: enzymatic (PCR) or bacterial
- Each fragment is sequenced independently
- Computational methods are used to align and assemble the overlapping sequences
into a continuous sequence that represents the original genome. Software tools identify
overlaps between fragments and piece them together to form longer contiguous
sequences, called contigs.
, - Any gaps in the assembled sequence are filled, and the final sequence is validated for
accuracy.
ways to cut DNA
sonication, restriction enzymes, hydro-shearing, enzymatic shearing
sonication
the use of high frequency sound waves to break open cells
restriction enzymes
Enzyme that cuts DNA at a specific sequence of nucleotides
DNA sequence assembly
Combining sequence reads to build the entire sequence of the template DNA. Get a
consensus of most common nucleotides in that position.
Need threshold of overlap of 4-5, anything less could be a misfit
Have confidence in some sites, lack confidence in other sites. If not enough sequenced
in a particular area, need to produce more sequences
Need a criterion to set as our quality control. Target to sequence every nucleotide at
least 10 times over, as this allows for error correction
consensus
A consensus sequence is a sequence that represents the most common nucleotide at
each position from a set of aligned sequences. It is derived by comparing multiple
sequences and choosing the most frequent nucleotide at each position
MAJORITY VOTE of contigs
Sequencing heterozygous individuals
Difference in consensus would be 50/50, if not then likely an error