What is READSEQ?
READSEQ reads and writes nucleic/protein sequences in various formats. Data files may have
multiple sequences. READSEQ was written by D.G. Gilbert
at the Biology Department, Indiana University, Bloomington.
The program may be freely copied and used by anyone. Developers are encourged to incorporate parts
in their programs, rather than devise their own private sequence format.
Where can I get READSEQ?
This program is available thru Internet gopher, as
gopher ftp.bio.indiana.edu
browse into the IUBio-Software+Data/molbio/readseq/ folder
select the readseq.shar document
Or thru anonymous FTP in this manner:
my_computer> ftp ftp.bio.indiana.edu (or IP address 129.79.224.25)
username: anonymous
password: my_username@my_computer
ftp> cd molbio/readseq
ftp> get readseq.shar
ftp> bye
readseq.shar is a Unix shell archive of the READSEQ files. This file can be editted by any
text editor to reconstitute the original files, for those who do not have a Unix system or an
Unshar program. Read the top of this .shar file for further instructions.
There are also pre-compiled executables for the following computers: Silicon Graphics Iris, Sparc
(Sun Sparcstation & clones), VMS-Vax, Macintosh. Use binary ftp to transfer these, except
Macintosh. The Mac version is just the command-line program in a window, not very handy.
Which sequence formats can I use with READSEQ?
Readseq is particularly useful as it automatically detects many sequence formats, and interconverts
among them.
Formats which readseq currently understands:
- IG/Stanford, used by Intelligenetics and others
- GenBank/GB, genbank flatfile format
- EMBL, EMBL flatfile format
- GCG, single sequence format of GCG software
- DNAStrider, for common Mac program
- Fitch format, limited use
- Pearson/Fasta, a common format used by Fasta programs and others
- Phylip3.2, sequential format for Phylip programs
- Phylip, interleaved format for Phylip programs (v3.3, v3.4)
- Plain/Raw, sequence data only (no name, document, numbering)
- MSF multi sequence format used by GCG software
- PAUP's multiple sequence (NEXUS) format
- PIR/CODATA format used by PIR
- ASN.1 format used by NCBI
!!! If you wish to process your results with "CLUSTALW" use PIR/CODATA or
Pearson/Fasta format !!!
Which parameters can I use with this web-based READSEQ version?
The web-based version of READSEQ has only a reduced set of parameters which can be adjusted:
- DIRECTION: Use "forward" or "reverse"(i.e. reverse-complement) sequence
- LETTER CASE: Use "upper case" or "lower case" letters
- BLANK LINES: Number of blank line(s) between sequence blocks
How does READSEQ work?
The auto-detection feature of READSEQ which distinguishes file formats looks for some of the unique
keywords and symbols that are found in each format. It is not infallible at this, though it attempts
to exclude unknown formats. In general, if you feed to READSEQ a sequence file that you know is one
of these common formats, you are okay. If you feed it data that might be oddball formats, or
non-sequence data, you might well get garbage results. Also, different developers are always thinking
up minor twists on these common formats (like PAUP requiring a blank line between blocks of Phylip
format, or IG adding form feeds between sequences), which may cause hassles.
In general, output supports only minimal subsets of each format needed for sequence data exchanges.
Features, descriptions and other format-unique information is discarded.
READSEQ is NOT optimized for LARGE files. It generally makes several reads thru each input file (one
per sequence output at present, future version may optimize this). It should handle input and output
files and sequences of any size, but will slow down quite a bit for very large (multi megabyte) sized
files. It is NOT recommended for converting databanks or large subsets there-of. It is primarily
directed at the small files that researchers use to maintain their personal data, which they frequently
need to interconvert for the various analysis programs which so frequently require a special format.
Warning: Phylip format input is now supported (30Dec92), however the auto-detection of
Phylip format is very probabilistic and messy, especially distinguishing sequential from
interleaved versions. It is not recommended that one use readseq to convert files from Phylip
format to others unless essential.
|