The following brief description relates to the
terms listed for each Genbank entry. For official
descritions of these terms see the database.
LOCUS- a brief description
giving any acronym for the gene, the number of
nucleotides in the sequence. whether DNA or RNA
sequence, the date the entry was added.
DEFINITION- A more accurate
description of the gene.
ACCESSION- A unique single letter and
number code for the entry.
NID- A unique single letter and number
code for the entry.
KEYWORDS- A list of keywords that
describe the gene and its product that can be
used when searching the database of sequence
SOURCE- the species, cell type etc
from which the sequence was derived.
ORGANISM- a heirarchical
classification of the organism.
For example
HUMANS are
listed as: Homo
sapiens Eukaryota; Metazoa; Chordata;
Vertebrata; Mammalia; Eutheria;Primates;
Catarrhini; Hominidae; Homo.
REFERENCE- a reference or list of
references which the sequence has been published
in, or more commonly these days, a direct
submission to the database possibly without
publication (idf the sequence was not
"interesting" enough"). Published references
include author, address, paper title, journal,
volue, page numbers, year. Useful when searching
database by author.
MEDLINE- The medline unique identifier
number for the reference. Is also linked to
Medline database.
FEATURES- A set of identifying
features of the DNA. assigning chromosome, clone
information, map position, length of DNA, start
codon position, intron/exon boundaries, cross
referencing with other databanks (such as Swiss
protein), amino acid translation of the DNA
sequence. Location/Qualifiers
source
/organism="Homo sapiens"
/db_xref=""
/chromosome=""
/clone=""
/map="p1 region 1A1"
gene
/gene=" 3 letter code for gene"
CDS
/gene=""
/codon_start=1
/db_xref="PID: #####"
/db_xref="SWISS-PROT: ######"
/translation="nnnnnnnnnnnn"
BASE COUNT- total number of a c g
t in the sequence
ORIGIN- A full list of all nucleotides
in the sequence numbering from 1, in blocks of
10 nucleotides, 60 nuclotides to a line, and
each line starting with the number of the first
nucleotide on the line. This organization allows
easy extraction of the sequence by many DNA
analysis computer programs. Note: COURIER
font, is a font which gives an equal space to
each letter therefore allowing easy alignment.
Most other fonts are proportionally spaced.
1 aagctttgtt tttttaaaga taacatacac atatattgat
aatgataaac aattcatata
61 gctttttgtg tcctctcgtt ttgtgacata
aaaggtcaat gaaaaaattg gcgattaagt
//- indicates the end of the
sequence.