rebaseextract

 

Function

Extract data from REBASE

Description

The Restriction Enzyme database (REBASE) is a collection of information about restriction enzymes and related proteins. It contains published and unpublished references, recognition and cleavage sites, isoschizomers, commercial availability, methylation sensitivity, crystal and sequence data. DNA methyltransferases, homing endonucleases, nicking enzymes, specificity subunits and control proteins are also included. Most recently, putative DNA methyltransferases and restriction enzymes, as predicted from analysis of genomic sequences, are also listed.

The home page of REBASE is: http://rebase.neb.com/

This program derives recognition site and cleavage information from the "withrefm" file of an REBASE distribution. It creates three files in the EMBOSS data subdirectory REBASE. A pattern file, a reference file and a supplier file.

The EMBOSS programs that find restriction cutting sites use the data files produced by this program and will not work without them.

Running this program may be the job of your system manager.

Usage

Here is a sample session with rebaseextract


% rebaseextract 
Extract data from REBASE
Full pathname of WITHREFM: ../../data/withrefm

Go to the input files for this example
Go to the output files for this example

Command line arguments

   Mandatory qualifiers:
  [-inf]               infile     Full pathname of WITHREFM

   Optional qualifiers: (none)
   Advanced qualifiers: (none)
   General qualifiers:
  -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-inf]
(Parameter 1)
Full pathname of WITHREFM Input file Required
Optional qualifiers Allowed values Default
(none)
Advanced qualifiers Allowed values Default
(none)

Input file format

The input file must be the "withrefm" file of a REBASE distribution.

For example, the withrefm file for REBASE version 005 is at: ftp://ftp.neb.com/pub/rebase/withrefm.005

Input files for usage example

File: ../../data/withrefm

 
REBASE version 106                                              withrefm.106
 
    =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
    REBASE, The Restriction Enzyme Database   http://rebase.neb.com
    Copyright (c)  Dr. Richard J. Roberts, 2001.   All rights reserved.
    =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 
Rich Roberts                                                    May 31 2001
 

<ENZYME NAME>   Restriction enzyme name.
<ISOSCHIZOMERS> Other enzymes with this specificity.
<RECOGNITION SEQUENCE> 
                These are written from 5' to 3', only one strand being given.
                If the point of cleavage has been determined, the precise site
                is marked with ^.  For enzymes such as HgaI, MboII etc., which
                cleave away from their recognition sequence the cleavage sites
                are indicated in parentheses.  

                For example HgaI GACGC (5/10) indicates cleavage as follows:
                                5' GACGCNNNNN^      3'
                                3' CTGCGNNNNNNNNNN^ 5'

                In all cases the recognition sequences are oriented so that
                the cleavage sites lie on their 3' side.

                REBASE Recognition sequences representations use the standard 
                abbreviations (Eur. J. Biochem. 150: 1-5, 1985) to represent 
                ambiguity.
                                R = G or A
                                Y = C or T
                                M = A or C
                                K = G or T
                                S = G or C
                                W = A or T
                                B = not A (C or G or T)
                                D = not C (A or G or T)
                                H = not G (A or C or T)
                                V = not T (A or C or G)
                                N = A or C or G or T



                ENZYMES WITH UNUSUAL CLEAVAGE PROPERTIES:  

                Enzymes that cut on both sides of their recognition sequences,
                such as BcgI, Bsp24I, CjeI and CjePI, have 4 cleavage sites
                each instead of 2.



  [Part of this file has been deleted for brevity]

<6>S.A. Thompson
<7>N
<8>Morgan, R.D., Unpublished observations.
Morgan, R.D., Xu, Q., US Patent Office, 2001.
Xu, Q., Morgan, R., Blaser, M., Unpublished observations.

<1>HspAI
<2>HhaI,AspLEI,BcaI,BspLAI,BstHHI,CcoP95I,CfoI,Csp1470I,FnuDIII,Hin6I,Hin7I,HinGUI,HinP1I,HinS1I,HinS2I,Hpy99III,HpyF10I,HsoI,MnnIV,NgoEII,SciNI
<3>G^CGC
<4>
<5>Haemophilus species A
<6>S.K. Degtyarev
<7>I
<8>Rechkunova, N.I., Prikhod'ko, E.A., Shevchenko, A.V., Degtyarev, S.K., Unpublished observations.

<1>KpnI
<2>Acc65I,AhaB8I,Asp718I,BspJ106I,Eco149I,Esp19I,KpnK14I,MvsI,MvsAI,MvsBI,MvsCI,MvsDI,MvsEI,NmiI,Sau10I,SthI,SthAI,SthBI,SthCI,SthDI,SthEI,SthFI,SthGI,SthHI,SthJI,SthKI,SthLI,SthMI,SthNI,Uba76I,Uba85I,Uba86I,Uba87I,Uba1201I
<3>GGTAC^C
<4>4(6)
<5>Klebsiella pneumoniae OK8
<6>ATCC 49790
<7>ABCDEFGHIJKLMNOQRSTU
<8>Kiss, A., Finta, C., Venetianer, P., (1991) Nucleic Acids Res., vol. 19, pp. 3460.
Smith, D.I., Blattner, F.R., Davies, J., (1976) Nucleic Acids Res., vol. 3, pp. 343-353.
Tomassini, J., Roychoudhury, R., Wu, R., Roberts, R.J., (1978) Nucleic Acids Res., vol. 5, pp. 4055-4064.

<1>NotI
<2>CciNI,CspBI,MchAI
<3>GC^GGCCGC
<4>?(4)
<5>Nocardia otitidis-caviarum
<6>ATCC 14630
<7>ABCDEFGHJKLMNOQRSTU
<8>Borsetti, R., Wise, D., Qiang, B.-Q., Schildkraut, I., Unpublished observations.
Morgan, R.D., Unpublished observations.
Morgan, R.D., Benner, J.S., Claus, T.E., US Patent Office, 1994.
Qiang, B.-Q., Schildkraut, I., (1987) Methods Enzymol., vol. 155, pp. 15-21.

<1>TaqI
<2>CviSIII,EsaBC3I,HpyV,Hpy26II,HpyF14III,HpyF16I,HpyF23I,HpyF24I,HpyF26III,HpyF30I,HpyF35I,HpyF40II,HpyF42IV,HpyF45I,HpyF49I,HpyF52I,HpyF59III,HpyF62II,HpyF64I,HpyF65II,HpyF66IV,HpyF71I,HpyF73II,HpyJP26II,PpaAII,Taq20I,Tbr51I,TfiA3I,TfiTok4A2I,TfiTok6A1I,TflI,Tsc4aI,Tsp32I,Tsp32II,Tsp358I,Tsp505I,Tsp510I,TspAK13D21I,TspAK16D24I,TspNI,TspVi4AI,TspVil3I,Tth24I,TthHB8I,TthRQI
<3>T^CGA
<4>4(6)
<5>Thermus aquaticus YTI
<6>J.I. Harris
<7>ABCDEFGIJLMNOQRSTU
<8>Anton, B.P., Brooks, J.E., Unpublished observations.
Fomenkov, A., Xiao, J.-P., Dila, D., Raleigh, E., Xu, S.-Y., (1994) Nucleic Acids Res., vol. 22, pp. 2399-2403.
McClelland, M., (1981) Nucleic Acids Res., vol. 9, pp. 6795-6804.
Sato, S., Hutchison, C.A. III, Harris, J.I., (1977) Proc. Natl. Acad. Sci. U. S. A., vol. 74, pp. 542-546.
Zebala, J.A., (1993) Diss. Abstr., vol. 54, pp. 1394-1398.

Output file format

Output files for usage example

File: REBASE


The output files are held in the REBASE subdirectory of the EMBOSS data directory. There are three:

Data files

The "withrefm" file of an REBASE distribution is the input file for this program.

Notes

The home page of REBASE is: http://rebase.neb.com/

Running this program may be the job of your system manager.

The ready-made files produced by this program may already be available at the REBASE web site: http://rebase.neb.com/rebase/rebase.files.html or http://rebase.neb.com/rebase/rebase.f37.html

References

  1. Nucleic Acids Research 27: 312-313 (1999).

Warnings

The program will warn you if the input file is incorrectly formatted.

Diagnostic Error Messages

Exit status

It exits with status 0 unless an error is reported.

Known bugs

See also

Program nameDescription
aaindexextractExtract data from AAINDEX
cutgextractExtract data from CUTG
printsextractExtract data from PRINTS
prosextractBuilds the PROSITE motif database for patmatmotifs to search
tfextractExtract data from TRANSFAC

Author(s)

This application was written by Alan Bleasby (ableasby@hgmp.mrc.ac.uk)

History

Completed 12th April 1999

Target users

This program is intended to be used by administrators responsible for software and database installation and maintenance.

Comments