![]() |
trimseq |
Specifically, it:
It then optionally trims off poor quality regions from the end, using a threshold percentage of unwanted characters in a window which is moved along the sequence from the ends. The unwanted characters which are used are X's and N's (in nucleic sequences), optionally *'s, and optionally IUPAC ambiguity codes.
The program stops trimming the ends when the percentage of unwanted characters in the moving window drops below the threshold percentage.
Thus if the window size is set to 1 and the percentage threshold is 100, no further poor quality regions will be removed. If the window size is set to 5 and the percentage threshold is 40 then the sequence AAGCTNNNNATT will be trimmed to AAGCT, while AAGCTNATT or AAGCTNNNNATTT will not be trimmed as less than 40% of the last 5 characters are N's.
After trimming these poor quality regions, it will again then trim off any dangling gap characters from the ends .
% trimseq xyz.seq xyz_clean.seq -window 1 -percent 100
Tidy up the sequence ends, removing poor bits at the ends
% trimseq xyz.seq xyz_clean.seq -window 5 -percent 40
Tidy up the sequence ends, removing very poor bits at the ends
% trimseq xyz.seq xyz_clean.seq -window 20 -percent 80
Tidy up the sequence ends, removing even maginally poor bits at the ends
% trimseq xyz.seq xyz_clean.seq -window 20 -percent 10
Tidy up the sequence ends, removing poor bits including ambiguity codes
% trimseq xyz.seq xyz_clean.seq -window 20 -percent 50 -strict
Tidy up the sequence ends, removing asterisks from a protein end
% trimseq xyz.seq xyz_clean.seq -window 1 -percent 100 -star
Tidy up the sequence ends, removing poor bits at only the left end
% trimseq xyz.seq xyz_clean.seq -window 20 -percent 50 -noright
Mandatory qualifiers: [-sequence] seqall Sequence database USA [-outseq] seqoutall Output sequence(s) USA Optional qualifiers: -window integer This determines the size of the region that is considered when deciding whether the percentage of ambiguity is greater than the threshold. A value of 5 means that a region of 5 letters in the sequence is shifted along the sequence from the ends and trimming is done only if there is a greater or equal percentage of ambiguity than the threshold percentage. -percent float This is the threshold of the percentage ambiguity in the window required in order to trim a sequence. -strict boolean In nucleic sequences, trim off not only N's and X's, but also the nucleotide IUPAC ambiguity codes M, R, W, S, Y, K, V, H, D and B. In protein sequences, trim off not only X's but also B and Z. -star boolean In protein sequences, trim off not only X's, but also the *'s Advanced qualifiers: -[no]left boolean Trim at the start -[no]right boolean Trim at the end General qualifiers: -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
Sequence database USA | Readable sequence(s) | Required |
[-outseq] (Parameter 2) |
Output sequence(s) USA | Writeable sequence(s) | <sequence>.format |
Optional qualifiers | Allowed values | Default | |
-window | This determines the size of the region that is considered when deciding whether the percentage of ambiguity is greater than the threshold. A value of 5 means that a region of 5 letters in the sequence is shifted along the sequence from the ends and trimming is done only if there is a greater or equal percentage of ambiguity than the threshold percentage. | Any integer value | 1 |
-percent | This is the threshold of the percentage ambiguity in the window required in order to trim a sequence. | Any numeric value | 100.0 |
-strict | In nucleic sequences, trim off not only N's and X's, but also the nucleotide IUPAC ambiguity codes M, R, W, S, Y, K, V, H, D and B. In protein sequences, trim off not only X's but also B and Z. | Boolean value Yes/No | No |
-star | In protein sequences, trim off not only X's, but also the *'s | Boolean value Yes/No | No |
Advanced qualifiers | Allowed values | Default | |
-[no]left | Trim at the start | Boolean value Yes/No | Yes |
-[no]right | Trim at the end | Boolean value Yes/No | Yes |
Program name | Description |
---|---|
biosed | Replace or delete sequence sections |
cutseq | Removes a specified section from a sequence |
degapseq | Removes gap characters from sequences |
descseq | Alter the name or description of a sequence |
entret | Reads and writes (returns) flatfile entries |
extractfeat | Extract features from a sequence |
extractseq | Extract regions from a sequence |
listor | Writes a list file of the logical OR of two sets of sequences |
maskfeat | Mask off features of a sequence |
maskseq | Mask off regions of a sequence |
newseq | Type in a short new sequence |
noreturn | Removes carriage return from ASCII files |
notseq | Excludes a set of sequences and writes out the remaining ones |
nthseq | Writes one sequence from a multiple set of sequences |
pasteseq | Insert one sequence into another |
revseq | Reverse and complement a sequence |
seqret | Reads and writes (returns) sequences |
seqretsplit | Reads and writes (returns) sequences in individual files |
skipseq | Reads and writes (returns) sequences, skipping the first few |
splitter | Split a sequence into (overlapping) smaller sequences |
trimest | Trim poly-A tails off EST sequences |
union | Reads sequence fragments and builds one sequence |
vectorstrip | Strips out DNA between a pair of vector sequences |
yank | Reads a sequence range, appends the full USA to a list file |