READSEQ - Frequently Asked Questions

READSEQ - Frequently Asked Questions

What is READSEQ?

READSEQ reads and writes nucleic/protein sequences in various formats. Data files may have multiple sequences. READSEQ was written by D.G. Gilbert at the Biology Department, Indiana University, Bloomington.
The program may be freely copied and used by anyone. Developers are encourged to incorporate parts in their programs, rather than devise their own private sequence format.

Where can I get READSEQ?

This program is available thru Internet gopher, as

gopher ftp.bio.indiana.edu
browse into the IUBio-Software+Data/molbio/readseq/ folder
select the readseq.shar document

Or thru anonymous FTP in this manner:
my_computer> ftp ftp.bio.indiana.edu (or IP address 129.79.224.25)
username: anonymous
password: my_username@my_computer
ftp> cd molbio/readseq
ftp> get readseq.shar
ftp> bye

readseq.shar is a Unix shell archive of the READSEQ files. This file can be editted by any text editor to reconstitute the original files, for those who do not have a Unix system or an Unshar program. Read the top of this .shar file for further instructions.
There are also pre-compiled executables for the following computers: Silicon Graphics Iris, Sparc (Sun Sparcstation & clones), VMS-Vax, Macintosh. Use binary ftp to transfer these, except Macintosh. The Mac version is just the command-line program in a window, not very handy.

Which sequence formats can I use with READSEQ?

Readseq is particularly useful as it automatically detects many sequence formats, and interconverts among them.
Formats which readseq currently understands:

IG/Stanford, used by Intelligenetics and others

GenBank/GB, genbank flatfile format

NBRF format

EMBL, EMBL flatfile format

GCG, single sequence format of GCG software

DNAStrider, for common Mac program

Fitch format, limited use

Pearson/Fasta, a common format used by Fasta programs and others

Phylip3.2, sequential format for Phylip programs

Phylip, interleaved format for Phylip programs (v3.3, v3.4)

Plain/Raw, sequence data only (no name, document, numbering)

MSF multi sequence format used by GCG software

PAUP's multiple sequence (NEXUS) format

PIR/CODATA format used by PIR

ASN.1 format used by NCBI

!!! If you wish to process your results with "CLUSTALW" use PIR/CODATA or Pearson/Fasta format !!!

Which parameters can I use with this web-based READSEQ version?

The web-based version of READSEQ has only a reduced set of parameters which can be adjusted:

DIRECTION: Use "forward" or "reverse"(i.e. reverse-complement) sequence

LETTER CASE: Use "upper case" or "lower case" letters

BLANK LINES: Number of blank line(s) between sequence blocks

How does READSEQ work?

The auto-detection feature of READSEQ which distinguishes file formats looks for some of the unique keywords and symbols that are found in each format. It is not infallible at this, though it attempts to exclude unknown formats. In general, if you feed to READSEQ a sequence file that you know is one of these common formats, you are okay. If you feed it data that might be oddball formats, or non-sequence data, you might well get garbage results. Also, different developers are always thinking up minor twists on these common formats (like PAUP requiring a blank line between blocks of Phylip format, or IG adding form feeds between sequences), which may cause hassles.
In general, output supports only minimal subsets of each format needed for sequence data exchanges. Features, descriptions and other format-unique information is discarded.
READSEQ is NOT optimized for LARGE files. It generally makes several reads thru each input file (one per sequence output at present, future version may optimize this). It should handle input and output files and sequences of any size, but will slow down quite a bit for very large (multi megabyte) sized files. It is NOT recommended for converting databanks or large subsets there-of. It is primarily directed at the small files that researchers use to maintain their personal data, which they frequently need to interconvert for the various analysis programs which so frequently require a special format.
Warning: Phylip format input is now supported (30Dec92), however the auto-detection of Phylip format is very probabilistic and messy, especially distinguishing sequential from interleaved versions. It is not recommended that one use readseq to convert files from Phylip format to others unless essential.

...go back to the READSEQ input form...