UNSW Embryology

DNA- Databanks

Page 1 | 2 | 3 | 4 | 5 | WWW

Page LinksIntroductionDNA DatabanksNucleotide Sequence SearchProtein Sequence SearchBiomolecule 3D Structure SearchDatabank Entry Terms

Intro | Genes and Diseases | Databanks | Genetic Codes | Dev Genes | Genomes

Introduction

DNA from any organism anywhere in the world ends up as a sequence in a DNA Databank. DNA sequencing in all species is currently (April 1999) 2,570 million bases in 3.525 million sequence database records. The International Nucleotide Sequence Database Collaboration is comprised of DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis. These 3 DNA Databanks are all on the WWW and can easily be searched by many different keys (DNA sequence, protein name, author, species, disease name, etc). I have also included search windows for Nucleotide Sequences, Protein Sequence and Biomolecule 3D Structure. Another excellent source for DNA information and with easy to read formatting is the "NiceProt View" produced through a collaboration between the Swiss Institute of Bioinformatics and the EMBL outstation - the European Bioinformatics Institute (see note).

A comparison for the Dystrophin entry between the basic Genbank DNA Databank format and the Swiss-Protein "NiceProt View" is shown.

A sample format of a single generic DNA Genbank entry (NCBI) databank is also shown below.

DNA Databanks WWW

Nucleotide Sequence Search

Search Field: Mode:

Enter one or more author names, text words, or other keywords. To search for all terms that begin with a given word, place an asterisk (*) at the end of the word. Journal Titles must be MEDLINE abbreviations; Author names must be in the form LastName Initial(s), e.g. Smith BJ. The initials can be omitted. Detailed Help is available.

Protein Sequence Search

Search Field: Mode:

Enter one or more author names, text words, or other keywords. To search for all terms that begin with a given word, place an asterisk (*) at the end of the word. Journal Titles must be MEDLINE abbreviations; Author names must be in the form LastName Initial(s), e.g. Smith BJ. The initials can be omitted. Detailed Help is available.

Biomolecule 3D Structure Search

Search Field: Mode:

Enter one or more author last names, text words, or other keywords. To search for all terms that begin with a given word, place an asterisk (*) at the end of the word. Journal Titles must be MEDLINE abbreviations; Author names must be in the form LastName Initial(s), e.g. Smith BJ. The initials can be omitted. Detailed Help is available.

NCBI Databases: [Proteins] [3D Structures] [Genomes] [Taxonomy] [PubMed]

Genbank Entry Terms

The following brief description relates to the terms listed for each Genbank entry. For official descritions of these terms see the database.

LOCUS- a brief description giving any acronym for the gene, the number of nucleotides in the sequence. whether DNA or RNA sequence, the date the entry was added.

DEFINITION- A more accurate description of the gene.

ACCESSION- A unique single letter and number code for the entry.

NID- A unique single letter and number code for the entry.

KEYWORDS- A list of keywords that describe the gene and its product that can be used when searching the database of sequence

SOURCE- the species, cell type etc from which the sequence was derived.

ORGANISM- a heirarchical classification of the organism.

For example HUMANS are listed as: Homo sapiens Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria;Primates; Catarrhini; Hominidae; Homo.

REFERENCE- a reference or list of references which the sequence has been published in, or more commonly these days, a direct submission to the database possibly without publication (idf the sequence was not "interesting" enough"). Published references include author, address, paper title, journal, volue, page numbers, year. Useful when searching database by author.

MEDLINE- The medline unique identifier number for the reference. Is also linked to Medline database.

FEATURES- A set of identifying features of the DNA. assigning chromosome, clone information, map position, length of DNA, start codon position, intron/exon boundaries, cross referencing with other databanks (such as Swiss protein), amino acid translation of the DNA sequence. Location/Qualifiers

source

/organism="Homo sapiens"

/db_xref=""

/chromosome=""

/clone=""

/map="p1 region 1A1"

gene

/gene=" 3 letter code for gene"

CDS

/gene=""

/codon_start=1

/db_xref="PID: #####"

/db_xref="SWISS-PROT: ######"

/translation="nnnnnnnnnnnn"

BASE COUNT- total number of a c g t in the sequence

ORIGIN- A full list of all nucleotides in the sequence numbering from 1, in blocks of 10 nucleotides, 60 nuclotides to a line, and each line starting with the number of the first nucleotide on the line. This organization allows easy extraction of the sequence by many DNA analysis computer programs. Note: COURIER font, is a font which gives an equal space to each letter therefore allowing easy alignment. Most other fonts are proportionally spaced.

1 aagctttgtt tttttaaaga taacatacac atatattgat aatgataaac aattcatata

61 gctttttgtg tcctctcgtt ttgtgacata aaaggtcaat gaaaaaattg gcgattaagt

//- indicates the end of the sequence.

 

  • The "NiceProt View" produced through a collaboration between the Swiss Institute of Bioinformatics and the EMBL outstation - the European Bioinformatics Institute is reproduced unchanged and with copyright statement attached in UNSW Embryology for educational purposes only.

m.hill@unsw.edu.au
Date Last Modified: 11/3/99
This site maintained by Dr M. Hill