[ Program Manual | User's Guide | Data Files | Databases ]
Sequence data may be submitted to the EMBL or GenBank databases using the following form. You can have a file with an empty form by using the command % fetch genbank.form. Fill it out with a text editor. Mailing instructions are included on the form. You can include the sequence itself in GCG format if you wish.
GBDAT.FRM Genetic Sequence Data Bank
15 August 1994
GenBank Flat File Release 84.0
Genbank Data Submission Form
196703 loci, 201815802 bases, from 196703 reported sequences
AUTHORIN from GenBank
If you have an IBM PC or Macintosh, we request that you use the
Authorin submission program rather than the submission form
below.
GENBANK ENLISTING COMMUNITY SUPPORT
To meet the goal of an accurate and current data bank, GenBank
and the scientific journals are requesting members of the
molecular biology community to submit their data directly to
the data bank.
Efforts to encourage author-initiated direct submission of
sequence data have been successful to date. Over 75% of new
sequences enter GenBank by these means.
THE SEQUENCE SUBMISSION PROGRAM: AUTHORIN
To facilitate authors' direct submission of data, GenBank has
developed an IBM PC and Macintosh program called Authorin. This is
an easy-to-use software tool designed to help researchers prepare
their sequence and annotation data for computer-readable
submission to GenBank, EMBL, DDBJ, or PIR.
Authors may enter their data in any order and may make
revisions at any time prior to submission. Partially completed
entries may be saved and finished in a later session. Menus are
provided for many of the fields to standardize terminology and
reduce typing.
SUBMITTING WITH AUTHORIN
Files generated by Authorin are simple text files and may be
copied to disk and mailed to the appropriate data bank.
Alternatively, the submission file may be transferred from your
personal computer to a computer system connected to the BITNET
or INTERNET networks and mailed electronically to the data
banks.
FULLY AUTOMATED DATA ENTRY
Authorin submissions received via electronic mail are
checked automatically by a suite of computer programs
designed to validate the sequence data and their
associated information. Validated data are passed directly
to the database. The submitter then receives an
electronic mail response containing the accession number
and a copy of the data as they appear in the database.
AUTHORIN AVAILABLE FREE
Authorin is available from GenBank at no charge. The program
is distributed for the IBM-PC and the Macintosh. If you
would like to receive a copy of the Authorin program and
documentation, send your name and address to:
National Center for Biotechnology Information
National Library of Medicine
Room 8N-803 Bldg. 38A
8600 Rockville Pike
Bethesda, MD 20894
Tel: (301) 496-2475
E-mail: authorin@ncbi.nlm.nih.gov
****************************************************************************
SEQUENCE DATA SUBMISSION FORM
This form solicits the information needed for a nucleotide and/or amino acid
sequence data bank entry. By completing and returning it to us promptly you
help us to enter your data in the database accurately and rapidly.
Please answer all questions which apply to your data. If you submit two or
more non-contiguous sequences, please copy and fill out this form
for each additional sequence. Please include in your submission any
additional sequence data which is not reported in your manuscript but which
has been reliably determined (for example, introns or flanking sequences).
When submitting nucleic acid sequences containing protein coding regions,
please include a translation (SEPARATELY from the nucleic acid sequence).
Then send (1) this form, (2) a copy of your manuscript (if available) and
(3) your sequence data (in machine readable form) to the address shown below.
Information about the various ways you can send us your data and about
formats for the sequence data is given in the following sections.
SUBMITTING DATA TO GENBANK
We can process sequence and annotation data submitted in any of the following
ways:
1. ELECTRONIC FILE TRANSFER: files can be sent via computer network
to gb-sub@ncbi.nlm.nih.gov. This address can be reached via various gateways
from BITNET, INTERNET, USENET, JANET, JUNET, etc. Ask your local network
expert how to send it or phone us for help at (301) 496-2475
2. FLOPPY DISKS: Macintosh or DOS systems (all sizes and densities): if using
word processing software, the file should be sent as an ASCII text file rather
than as a software-specific file.
3. PRINTED COPY: as a last resort only! Please do not reduce the size of the
letters in the sequence.
Our address is:
GenBank Submissions
National Center for Biotechnology Information
National Library of Medicine
Room 8N-803, Bldg. 38A
8600 Rockville Pike
Bethesda, MD 20894
E-MAIL: gb-sub@ncbi.nlm.nih.gov
ACCESSION NUMBERS
An accession number is permanently assigned to each sequence submitted to the
database. We will assign an accession number upon receipt of this form
and return it to you within seven days, or contact you if there are
errors. We recommend that you cite this number when referring to both these
data and the article where they were originally reported. If you are
forwarding this number on to a journal, please send a photocopy or facsimile
of the notification received from GenBank; do not send the number over the
telephone.
If your manuscript has already been accepted for publication, the accession
number should be included at the galley proof stage as a note added in proof.
If the journal has not already provided a format, we suggest that the note
added to the manuscript or in the galley proof should be inserted as a
footnote on the title page and read approximately as follows: "The
nucleotide sequence data reported in this paper have been submitted to
GenBank and assigned the accession number M12345."
FORMATS FOR SUBMITTED DATA
We would appreciate receiving the sequence data in a form which conforms as
closely as possible to the following standards:
o Each sequence should include the names of the authors.
o Each distinct sequence should be listed separately using the same number
of bases/residues per line and clearly indicating its length in
bases/residues.
o Enumeration of distinct sequences should begin with a "1" and ascend in the
direction 5' to 3' (amino- to carboxy-terminus).
o Amino acid sequences should be listed using the one-letter code. The code
for representing the sequence characters should conform to the IUPAC-IUB
standards, which are described in the following references: Nucl. Acids
Res. 13: 3021-3030 (1985) for nucleotides, and J. Biol. Chem. 243:
3557-3559 (1968) for amino acids.
_________________________________________________
These data will be shared among the following databases: EMBL Data Library
(Heidelberg, Federal Republic of Germany); GenBank (NCBI, NIH, Bethesda, MD,
USA); DNA Data Bank of Japan (DDBJ; Mishima, Japan); National Biomedical
Research Foundation Protein Identification Resource (NBRF-PIR; Washington, D.C.,
U.S.A.); Martinsried Institute for Protein Sequence Data (MIPS; Martinsried,
Federal Republic of Germany) and International Protein Information Database in
Japan (JIPID; Noda, Japan).
I. GENERAL INFORMATION
==============================================================================
Your last name first name middle initials
------------------------------------------------------------------------------
Institution
------------------------------------------------------------------------------
Address
------------------------------------------------------------------------------
Computer mail address Telex number
------------------------------------------------------------------------------
Telephone Telefax number
==============================================================================
On what medium and in what format are you sending us your sequence data?
(see instructions at the beginning of this form)
[ ] electronic mail
[ ] diskette
computer: operating system:
editor: filename:
[ ] magnetic tape (specify format)
==============================================================================
II. CITATION INFORMATION
==============================================================================
These data represent
[ ]new submission [ ]correction (if correction, Accession number: )
==============================================================================
These data are [ ] published [ ] in press [ ] submitted [ ] in preparation
[ ] no plans to publish
------------------------------------------------------------------------------
authors
------------------------------------------------------------------------------
title of paper
------------------------------------------------------------------------------
journal volume, first-last pages, year
------------------------------------------------------------------------------
Do you agree that these data can be made available in the database before
they appear in print?
[ ] yes [ ] no, they can be made available after: (date)
==============================================================================
Does the sequence which you are sending with this form include data that
does NOT appear in the above citation?
[ ] no
[ ] yes, from position _______ to _______ [ ] bases OR
[ ] amino acid residues
(If your sequence contains 2 or more such spans, use the feature table
in section IV to indicate their positions)
If so, how should these data be cited in the database?
[ ] published [ ] in press [ ] submitted [ ] in preparation
[ ] no plans to publish
------------------------------------------------------------------------------
authors
------------------------------------------------------------------------------
address (if different from that given in section I)
------------------------------------------------------------------------------
title of paper
------------------------------------------------------------------------------
journal volume, first-last pages, year
==============================================================================
List references to papers and/or database entries which report sequences
overlapping with that submitted here.
1st author journal, vol., pages, year and/or database, accession number
------------------------------------------------------------------------------
------------------------------------------------------------------------------
==============================================================================
III. DESCRIPTION OF SEQUENCED SEGMENT
Wherever possible, please use standard nomenclature or conventions. If a
question is not applicable to your sequence, answer by writing N.A. in the
appropriate space; if the information is relevant but not available, write
a question mark (?).
==============================================================================
What kind of molecule did you sequence? (check all boxes which apply)
[ ] genomic DNA [ ] genomic RNA [ ] cDNA to mRNA [ ]cDNA to genomic RNA
[ ] organelle DNA [ ] organelle RNA please specify organelle:
[ ] tRNA [ ] rRNA [ ] snRNA [ ] scRNA
for viruses: [ ] virus or [ ] provirus or [ ] viroid [ ] DNA or [ ] RNA
[ ] ds or [ ] ss or [ ] circular [ ] enveloped
or [ ] nonenveloped
[ ] other nucleic acid. please specify:
[ ] peptide [ ] sequence assembled by [ ] overlap of sequenced fragments
[ ] homology with related sequence
[ ] other. please specify:
[ ] partial: [ ] N-terminal
[ ] C-terminal
[ ] internal fragment
==============================================================================
length of sequence [ ] bases or [ ] amino acid residues
------------------------------------------------------------------------------
gene name(s) (e.g., lacZ)
------------------------------------------------------------------------------
gene product name(s) (e.g., beta-D-galactosidase)
------------------------------------------------------------------------------
Enzyme Commission number (e.g., EC 3.2.1.23)
------------------------------------------------------------------------------
gene product subunit structure (e.g., hemoglobin alpha-2 beta-2)
==============================================================================
The following items refer to the original source of the molecule you have
sequenced.
organism (species) (e.g., Mus musculus) plant cultivar
------------------------------------------------------------------------------
strain (e.g., K12, BALB/c) substrain
------------------------------------------------------------------------------
name/number of individual/isolate (e.g., patient 123; influenza virus
A/PR/8/34)
------------------------------------------------------------------------------
developmental stage [ ] germ line [ ] rearranged
------------------------------------------------------------------------------
haplotype tissue type cell type
------------------------------------------------------------------------------
allele variant [ ] macronuclear
==============================================================================
The following items refer to the immediate experimental source of the
submitted sequence.
name of cell line (e.g., Hela; 3T3-L1) or plant cultivar
------------------------------------------------------------------------------
clone library clone(s), subclone(s)
==============================================================================
The following items refer to the position of the submitted sequence in the
genome.
chromosome (or segment) name/number
------------------------------------------------------------------------------
map position units: [ ] genome % [ ] nucleotide number
[ ] other:
==============================================================================
Using single words or short phrases, describe the properties of the sequence
in terms of:
- its associated phenotype(s);
- the biological/enzymatic activity of its product;
- the general functional classification of the gene and/or gene product
- macromolecules to which the gene product can bind (e.g., DNA, calcium,
other proteins);
- subcellular localization of the gene product;
- any other relevant information.
Example (for the viral erbB nucleotide sequence): transforming capacity; EGF
receptor-related; tyrosine kinase; oncogene; transmembrane protein.
==============================================================================
IV. FEATURES OF THE SEQUENCE
Please list below the types and locations of all significant features
experimentally identified within the sequence. Be sure that your sequence
is numbered beginning with "1." Use < or > if a feature extends beyond
the
beginning or end of the indicated sequence span.
In the column marked fill in
feature type of feature (see information below)
from number of first base/amino acid in the feature
to number of last base/amino acid in the feature
bp an "x" if numbering refers to position of a base pair
in a nucleotide sequence
aa an "x" if numbering refers to position of an amino
acid residue in a peptide sequence
id indicate method by which the feature was identified.
E = experimentally; S = by similarity with known
sequence or to an established consensus sequence; P =
by similarity to some other pattern, such as an
open reading frame
comp an "x" for a nucleotide sequence feature located on
strand complementary to that reported here
Significant features include:
- regulatory signals (e.g., promoters, attenuators, enhancers)
- transcribed regions (e.g., mRNA, rRNA, tRNA). (indicate reading frame
if start and stop codons are not present)
- regions subject to post-transcriptional modificaton (e.g., introns,
modified bases)
- translated regions
- extent of signal peptide, prepropeptide, propeptide, mature peptide
- regions subject to post-translational modification (e.g., glycosylated
or phosphorylated sites)
- other domains/sites of interest (e.g., extracellular domain, DNA-
binding domain, active site, inhibitory site)
- sites involved in bonding (disulfide, thiolester, intrachain, interchain)
- regions of protein secondary structure (e.g., alpha helix or beta sheet)
- conflicts with sequence data reported by other authors
- variations and polymorphisms
The first 2 lines of the table are filled in with examples.
==============================================================================
Numbering for features on submitted sequence [ ] matches manuscript
[ ] does not match manuscript
==============================================================================
feature from to bp aa id comp
------------------------------------------------------------------------------
EXAMPLE TATA box 1 8 x S
------------------------------------------------------------------------------
EXAMPLE exon 1 9 >264 x
==============================================================================
------------------------------------------------------------------------------
------------------------------------------------------------------------------
------------------------------------------------------------------------------
------------------------------------------------------------------------------
------------------------------------------------------------------------------
------------------------------------------------------------------------------
------------------------------------------------------------------------------
==============================================================================
V. SEQUENCE DATA
Please enter the nucleotide sequence data here:
Please enter the translated amino acid sequence here:
E6/11.90 (Last change: 09-Jan-1992)
GenBank ERROR / SUGGESTION REPORT FORM
GENERAL INSTRUCTIONS
This form should be used to report errors in GenBank data and to submit
suggestions. Your suggestions help us to keep GenBank data up-to-date and
accurate. We welcome your input.
Please answer all questions which apply to the problem or suggestion. If you
report two or more separate problems, please copy and fill out this form for
each additional report. You may fill out the computer-readable form using a
text editor or print the form and fill it out by hand.
Please send the form(s) to:
GenBank Updates
National Center for Biotechnology Information
National Library of Medicine
Room 8N-803, Bldg. 38A
8600 Rockville Pike
Bethesda, MD 20894
E-mail: update@ncbi.nlm.nih.gov
Phone: (301) 496-2475
Please be sure to include the primary (first) accession number and locus name
of all entries affected. The form is reproduced below.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Name: _________________________________________________________________________
Department: ________________________ Institution: ____________________________
Mailing Address: _______________________________________ Phone: _____________
City: ______________________________ State: ___________ Zip: _______________
Electronic Mail Address: ______________________________________________________
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Type of Report: [ ] error [ ] problem [ ] suggestion [ ] comment [ ] other
Release of GenBank to which this applies: ___________________________
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Entry Information:
Primary Accession Number(s): Entry Name(s):
Division (check one or more):
[ ] BCT [ ] INV [ ] MAM [ ] ORG [ ] PHG [ ] PLN [ ] PRI
[ ] ROD [ ] RNA [ ] SYN [ ] UNA [ ] VRL [ ] VRT
_______________________________________________________________________________
Field Type (check one or more):
[ ] Locus line [ ] Source [ ] Comment [ ] Origin
[ ] Accession [ ] Organism [ ] Features [ ] Sequence
[ ] Keywords [ ] Reference [ ] Base Count [ ] Other ___________
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Report / Suggestion
(please be as precise as possible - attach pages if necessary)
___________________________________________________________________________
Data presently in field:
___________________________________________________________________________
Proposed Change:
___________________________________________________________________________
Reason for Change:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
GenBank Staff Use Only:
[ ] Change made, Release __________ [ ] Reply, Date _______________
[ ] Approved by ___________________________________________________________
[ Program Manual | User's Guide | Data Files | Databases ]
Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com
Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996 Gentics Computer Group, Inc. All rights reserved.
Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.
All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.