NIH; National Institutes of Health
NCBI; National Center for Biotechnology Information
NLM; National Library of Medicine
Databases; well-organized collections of data that can be accessed (bereikt) through
the use of query language.
• GenBank; all available protein and nucleotide sequences
• Protein Data Bank; 3D structures of proteins, nucleic acids and carbohydrates
Other important databases; ENA (Euopean nucleotide archive) and DDBJ (DNA
DataBank of Japan). Data exchange occurs daily.
Sequence retrieval (ophalen) systems for users; NCBI, Ensemble, Expasy, DDBJ
etc.
The entrez nucleotide database
NCBI developed entrez nucleotide database.
It is a search and find system. Large scale of data domains, including literature,
nucleotide and protein sequences, complete genomes and 3D structures.
Collection of sequences from;
• GenBank; NIH genetic sequence database, an annotated (van aantekening
voorzien) collection of all publicly DNA sequences
GenBank is part of DDBJ, the European Molecular Biology Laboratory (EMBL)
and GenBank at NCBI.
• RefSeq; collection aims to provide a comprehensive (begrijpelijke), integrated
and non-redundant set of sequences; genomic DNA, transcript (RNA), protein
products. Basis for medical, functional and diversity studies
• PDB (Brookhaven Protein Data Bank); database for 3D structures
The GenBank file format
Locus; locus name, sequence length, molecule type, gen bank division and
modification data
Definition; short description of sequence
Accession; unique identifier for a sequence. (number and letters)
Version; identificationnubmer GI (GenInfo Identifier); parallel to version
Keywords; describe the sequence
Source; where the sequence is from. Short name of the organism.
Organism; scientific name for source organism.
References; Authors that discuss the data