Introduction
The Human Genome
- The human genome is now fully sequenced (approx. 3.3 billion bps)
- There are approx. 21,000 protein coding genes (established experimentally), but the number of proteins
cannot explain the phenotype of the organism
- Understanding the regulation of gene expression in the genome is key in determining the phenotype
Transcriptional Complexity Eukaryotes
- Transcription between eukaryotes and prokaryotes is different.
- Non-coding transcripts can be expressed at high level, but are mostly expressed at low level.
- Genes are structured as such: exon, intron, exon, intron.
- Many genes have multiple promoters, and as a result one can have approx. 22000 protein coding genes but
approx. 170,000 gene transcripts
- Many eukaryotic transcripts overlap in the genome as a result of regulation of gene expression
- The central dogma: DNA -> RNA (mRNA) -> amino acid sequence -> protein
- Transcription in eukaryotes requires an operon (which in itself acts as a regulator)
- The logic is different:
o In prokaryotes, the default state is expression; DNA is free to be bound by proteins, and the
promoter is free to be bound by RNA polymerase.
▪ The regulation mechanisms are mainly repressive.
o In eukaryotes, the presence of chromatin means that the default state is no
expression.
▪ Gene expression regulation is activated in 2 general steps: removal
of the nucleosomes to change the chromatin structure, and allows
access of transcription factors to the genes
▪ Silencing or repression renders transcription even more inactive.
▪ Transcription mechanisms either enhance the chromatin structure of
loosen it.
- Because there are 2 levels of activation, all expressed genes have an open chromatin
structure, but not all genes with an open chromatin structure are expressed (may
lack transcription factors)
- The presence of compartments also makes gene expression more complex.
o Nucleus: chromatin decompaction, transcription and mRNA processing, splicing, 5’ capping and 3’
polyadenylation
o Cytoplasm: translation, protein folding, transcription factor activation (occurs prior to gene
transcription)
o All internal membranes are interconnected.
- In summary, the central dogma is too simple.
o Eukaryotic gene structure is more complex, involving alternative exons and various promoters.
o Transcriptional units can be overlapping, and non-coding RNAs (ncRNAs) play a role is additional
regulation mechanisms
,Epigenetic Regulation of Gene Expression
- Definition: Epigenetics focuses on the mechanisms of transmission of offspring of hereditary traits
independent from genomic sequences
- DNA methylation, post-transcriptional modification of histones, nucleosome remodelers
- Phenotypes are determined by genes and epigenetic regulation.
o Genomic transcripts determine the phenotype.
o The high number of transcripts, the diversity of the proteome and the post-synthesis modification of
proteins increase the complexity and specificity of regulatory mechanisms
RNA Polymerase II
Catalytic Reaction by RNA Polymerase
- The enzymes catalyze the addition of a nucleotide to the 3’ end of the transcript to extend it.
o All nucleic acids in biological systems are synthesized 5’ to 3’
- The product of the reaction will be a transcript with n+1 nucleotides, and a pyrophosphate group (2
phosphates linked together)
- This reaction is catalyzed only when the incoming nucleotide anneals with the base of the template (RNA
polymerase will catalyze the reaction only when the nucleotide is correctly paired)
o The enzymes catalyze the nucleophilic attack of the OH on the 3’ phosphate (α phosphate)
- Magnesium helps compensate for the negative chares of the incoming nucleotide.
- The most important cofactor is the template, because without it the nucleotide cannot stabilize and bind
- The sequence of the transcript is complementary to the template strand.
- The process is very processive.
- The central role of RNA polymerase is to unwind the double helix, polymerase the RNA, and proofread the
transcript
- RNA polymerase II (RNAPII) assembles into larger initiation and elongation complexes, capable of promoter
recognition and response to regulatory signals
o They must bind to elongate and stop transcription.
Transcription Start Site (TSS)
- Synthesis always starts at 3.’
- The last 9-10 bases of the transcript are annealed to the template.
- This hybrid structure should be destroyed (only temporary)
- Pol II destroys the duplex and separates the transcript from the template
- The template is then reannealed to the coding strand when it exits the transcription site
- The TSS is at the beginning of the transcript (+1)
Promoter Classes
- In eukaryotes there are 2 general classes of promoters
- 60-70% of Pol II promoters are CpG islands, regions of the genome that are rich in CpG codons.
- CpG regions are more likely to be methylated.
o Methylation regulatory mechanisms act at Cp regions (a cytosine before a guanine), so they act as
regulatory regions, and are thus underrepresented in the genome
o CpG islands are 300-500 bps long.
, o The CpG island promoters are characterized by multiple TSSs.
- The TATA promoters (20-30%) are characterized by a single TSS
- There can be other upstream and downstream elements.
- The initiation site is labelled as Inr + 1.
- Eukaryotic RNA polymerases cannot recognize and bind directly to the promoters.
o This is not a weakness, but an additional regulatory mechanism; if they could bind autonomously
with high affinity, they would be difficult to regulate
Transcription
- Transcription can be divided into 3 steps
1. Initiation: polymerase binds to the promoter
2. Elongation: the transcription bubble forms, and the template
starts to form
3. Termination: disassembly of the transcription factors and
bubble, and the release of the RNA transcript
- All these steps provide a way of regulating gene expression (most
processes act at pre-initiation and initiation, so no nucleotides and energy are wasted)
Eukaryotic RNA Polymerases
- Transcription is specific in the sense that each polymerase has a specific role.
- Polymerase I: 28S, 18S and 6.8S of the large rRNA regions
o Vast majority of RNA is rRNA, so pol I is very processive
o Concentrated in the nucleolus (the areas where rDNA chromatin is organized)
o Only 1 promoter; least regulated
- Polymerases II and III are speared across the nucleus, and not specified to a specific region.
- Polymerase III: tRNA and 5S rRNA
- Polymerase II: small nuclear RNA (snRNA) and non-coding RNA
o Has many promoter regions which vary in activation, depending on which genes are expressed
o It has many more regulatory mechanisms.
Α-amanitin
- A highly toxic substance that was found to have different effects
on the 3 polymerases
o At low concentrations it inhibits Pol II completely while
having no effect on Pol I and III
o At 1000-fold concentrations, the toxin also inhibits Pol III
for most eukaryotes
o Pol I is completely resistant to it.
RNA Polymerase II Subunits
- RNA Pol II is a protein complex made of 12 subunits.
- Under electrophoresis, the subunits are separated based on molecular weight (the heavier
the protein, the less it travels through the gel)
o The enzyme has multiple subunits which can be present in multiple copies; the
more copies there are, the larger and darker the band
- Radioactive labelling is also used to determine the subunit’s molecular composition and
regulatory mechanisms
o Radioactive sulfur is used to identify methionine and cysteine amino acids; most
abundant in Rpb1 and Rpb2
o radioactive phosphate is used to determine if some subunits are phosphorylated
(the main method of protein function regulation)
▪ the presence of the stain is indicative that the subunit is regulated.
, ▪ phosphorylation of Pol II always occurs at Rpb1, and less intensely at Rpb6
- Subunits Rpb4 and Rpb9 have conditional deletion phenotypes, meaning that the enzyme can hypothetically
work even if they are not present
o The stability of the protein complex is strong, but these two subunits have a tendency to dissociate
o Deletion phenotype experiments work by deleting a specific gene sequence to observe if the cell can
survive without it (while maintaining optimal cellular conditions)
- The core subunits are Rbp1, 2 and 3; they are essential for enzyme activity
o They are orthologous to the β’ (Rpb1), β (Rpb2) and α (Rpb3) subunits of prokaryotic RNA
polymerase
- The common subunits are Rpb5, 6, 8, 10 and 12; they are common to all RNA polymerases
o They are found in all nuclear polymerases, and while their functions are not well known, their
presence suggests that they are fundamental to the transcription process
o Rpb6 is orthologous to the Ω subunit of prokaryotic RNA polymerase.
Rpb1 – The Largest Subunit
- Rpb1 has an important composition: it is the first subunit that can be phosphorylated as
a regulatory mechanism
- RNA Pol in eukaryotes and prokaryotes are structurally similar (conserved), but
eukaryotic Pol II has a specific carboxyterminal domain (CTD)
- The CTD can be observed as an external, free structure.
o It’s a protein domain whose function is to recruit elements when needed for the polymerization
process
- The CTD is a 7 amino acid sequence repeat.
o The number of repetitions is correlated to evolutionary complexity (yeast = 26, mammals = 52)
o The amino acids Tyr (tyrosine), Ser (serine) and Thr (threonine) all have hydroxyl groups, showing
that this is the site where phosphorylation occurs
o Pro (proline) is present to prevent the development of secondary
structures
o The structure is not fixed and can be altered for regulation.
- YSPTSPS is the consensus sequence, but variation can occur.
o Positions 1 and 6 are the most conserved, and position 7 is the least conserved
- The Rpb1 CTD is found in 3 main forms: IIa, IIb and IIo
o IIa is the unphosphorylated form of Rpb1.
o Kinases can phosphorylate it into IIo (most highly phosphorylated form)
▪ Different levels of phosphorylation provide different functions.
o IIb is unphosphorylated, but only because the regulatory sequence is deleted.
▪ It has no physiological role and not found naturally in the cell