Protein Secondary Structure Prediction
• Protein secondary structures are stable local conformations of a
polypeptide chain.
• They are critically important in maintaining a protein 3D structure.
• The highly regular and repeated structural elements include α-helices
and β-sheets.
• It has been estimated that nearly 50% of residues of a protein fold into
either α-helices and β-strands.
• An α-helix is a spiral-like structure with 3.6 amino acid residues per
turn.
• Prolines normally do not occur in the middle of helical segments, but
can be found at the end positions of α-helices.
• A β-sheet consists of two or more β-strands having an extended zigzag
conformation.
• The structure is stabilized by hydrogen bonding between residues of
adjacent strands, which actually may be long-range interactions at the
primary structure level.
, Protein Secondary Structure Prediction
• β-Strands at the protein surface show an alternating pattern of
hydrophobic and hydrophilic residues; buried strands tend to contain
mainly hydrophobic residues.
• Protein secondary structure prediction refers to the prediction of the
conformational state of each amino acid residue of a protein sequence
as one of the three possible states, namely, helices, strands, or coils,
denoted as H, E, and C, respectively.
• The prediction is based on the fact that secondary structures have a
regular arrangement of amino acids, stabilized by hydrogen bonding
patterns.
• The structural regularity serves the foundation for prediction
algorithms.
• Predicting protein secondary structures has a number of applications.
• It can be useful for the classification of proteins and for the separation
of protein domains and functional motifs.
• Secondary structures are much more conserved than sequences during
evolution.
, Protein Secondary Structure Prediction
• As a result, correctly identifying secondary structure elements (SSE) can
help to guide sequence alignment or improve existing sequence
alignment of distantly related sequences.
• In addition, secondary structure prediction is an intermediate step in
tertiary structure prediction as in threading analysis.
• Because of significant structural differences between globular proteins
and transmembrane proteins, they necessitate very different
approaches to predicting respective secondary structure elements.
Secondary structure prediction for globular proteins
• Protein secondary structure prediction with high accuracy is not a
minor ask.
• It remained a very difficult problem for decades.
• This is because protein secondary structure elements are context
dependent.
• The formation of α-helices is determined by short-range interactions,
whereas the formation of β-strands is strongly influenced by long-range
interactions.
, Protein Secondary Structure Prediction
• Prediction for long-range interactions is theoretically difficult.
• After more than three decades of effort, prediction accuracies have
only been improved from about 50% to about 75%.
• The secondary structure prediction methods can be either ab initio
based, which make use of single sequence information only, or
homology based, which make use of multiple sequence alignment
information.
• The ab initio methods, which belong to early generation methods,
predict secondary structures based on statistical calculations of the
residues of a single query sequence.
• The homology-based methods do not rely on statistics of residues of a
single sequence, but on common secondary structural patterns
conserved among multiple homologous sequences.
• Protein secondary structures are stable local conformations of a
polypeptide chain.
• They are critically important in maintaining a protein 3D structure.
• The highly regular and repeated structural elements include α-helices
and β-sheets.
• It has been estimated that nearly 50% of residues of a protein fold into
either α-helices and β-strands.
• An α-helix is a spiral-like structure with 3.6 amino acid residues per
turn.
• Prolines normally do not occur in the middle of helical segments, but
can be found at the end positions of α-helices.
• A β-sheet consists of two or more β-strands having an extended zigzag
conformation.
• The structure is stabilized by hydrogen bonding between residues of
adjacent strands, which actually may be long-range interactions at the
primary structure level.
, Protein Secondary Structure Prediction
• β-Strands at the protein surface show an alternating pattern of
hydrophobic and hydrophilic residues; buried strands tend to contain
mainly hydrophobic residues.
• Protein secondary structure prediction refers to the prediction of the
conformational state of each amino acid residue of a protein sequence
as one of the three possible states, namely, helices, strands, or coils,
denoted as H, E, and C, respectively.
• The prediction is based on the fact that secondary structures have a
regular arrangement of amino acids, stabilized by hydrogen bonding
patterns.
• The structural regularity serves the foundation for prediction
algorithms.
• Predicting protein secondary structures has a number of applications.
• It can be useful for the classification of proteins and for the separation
of protein domains and functional motifs.
• Secondary structures are much more conserved than sequences during
evolution.
, Protein Secondary Structure Prediction
• As a result, correctly identifying secondary structure elements (SSE) can
help to guide sequence alignment or improve existing sequence
alignment of distantly related sequences.
• In addition, secondary structure prediction is an intermediate step in
tertiary structure prediction as in threading analysis.
• Because of significant structural differences between globular proteins
and transmembrane proteins, they necessitate very different
approaches to predicting respective secondary structure elements.
Secondary structure prediction for globular proteins
• Protein secondary structure prediction with high accuracy is not a
minor ask.
• It remained a very difficult problem for decades.
• This is because protein secondary structure elements are context
dependent.
• The formation of α-helices is determined by short-range interactions,
whereas the formation of β-strands is strongly influenced by long-range
interactions.
, Protein Secondary Structure Prediction
• Prediction for long-range interactions is theoretically difficult.
• After more than three decades of effort, prediction accuracies have
only been improved from about 50% to about 75%.
• The secondary structure prediction methods can be either ab initio
based, which make use of single sequence information only, or
homology based, which make use of multiple sequence alignment
information.
• The ab initio methods, which belong to early generation methods,
predict secondary structures based on statistical calculations of the
residues of a single query sequence.
• The homology-based methods do not rely on statistics of residues of a
single sequence, but on common secondary structural patterns
conserved among multiple homologous sequences.