Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics,
computer science, mathematics, and statistics. Data intensive, large-scale biological
problems are addressed from a computational point of view. The most common problems
are modeling biological processes at the molecular level and making inferences from
collected data. A bioinformatics solution usually involves the following steps: Collect
statistics from biological data. Build a computational model. Solve a computational modeling
problem. Test and evaluate a computational algorithm.
This chapter gives a brief introduction to bioinformatics by first providing an introduction to
biological terminology and then discussing some classical bioinformatics problems organized
by the types of data sources. Sequence analysis is the analysis of DNA and protein
sequences for clues regarding function and includes subproblems such as identification of
homologs, multiple sequence alignment, searching sequence patterns, and evolutionary
analyses. Protein structures are three-dimensional data and the associated problems are
structure prediction (secondary and tertiary), analysis of protein structures for clues
regarding function, and structural alignment. Gene expression data is usually represented as
matrices and analysis of microarray data mostly involves statistics analysis, classification,
and clustering approaches. Biological networks such as gene regulatory networks, metabolic
pathways, and protein-protein interaction networks are usually modeled as graphs and graph
theoretic approaches are used to solve associated problems such as construction and
analysis of large-scale networks.
History of Bioinformatics
• Bioinformatics emerged in mid 1990s.
• From 1965-78 Margaret O. Dayhoff established first database of protein sequences,
published annually as series of volume entitled “Atlas of protein sequence and structure”.
• During 1977 DNA sequences began to accumulate slowly in literature and it became more
common to predict protein sequences by translating sequenced genes than by direct
sequencing of proteins.
• Thus number of uncharacterised proteins began to increase.
• In 1980, there were enough DNA sequences to justify the establishment of the first
nucleotide sequence database, GenBank at National Centre for Biotechnology
Information (NCBI), USA. NCBI served as primary
databank provider for information.
Objectives of Bioinformatics
1. Development of new algorithms and statistics for assessing the relationships
among large sets of biological data.
2. Application of these tools for the analysis and interpretation of the various biological
data.
3. Development of database for an efficient storage, access and management of the large
body of various biological information