banana

 

Function

Bending and curvature plot in B-DNA

Description

banana predicts bending of a normal (B) DNA double helix, using the method of Goodsell & Dickerson, NAR 1994 11;22(24):5497-5503.

This program calculates the magnitude of local bending and macroscopic curvature at each point along an arbitrary B-DNA sequence, using any desired bending model that specifies values of twist, roll and tilt as a function of sequence.

The data, based on the nucleosome positioning data of Satchwell et al 1986 (J. Mol. Biol. 191, 659-675), correctly predicts experimental A-tract curvature as measured by gel retardation and cyclization kinetics and successfully predicts curvature in regions containing phased GGGCCC sequences. (This is the model 'a' described in the Goodsell & Dickerson paper).

This model - showing local bending at mixed sequence DNA, strong bends at the sequence GGC, and straight, rigid A-tracts - is the only model, out of six models investigated in Goodsell & Dickerson paper, that is consistent with both solution data from gel retardation and cyclization kinetics and structural data from x-ray crystallography.

The consensus sequence for DNA bending is 5 As and 5 non-As alternating. "N" is an ambiguity code for any base, and "B" is the ambiguity code for "not A" so "BANANA" is itself a bent sequence - hence the name of this program.

The program outputs both a graphical display and a text file of the results.

Background

Sequence-dependent DNA bending, like sequence-dependent prtoein folding, is a problem taht remains frustratingly elusive. The issue has obvious biological importance in such matters as the winding of DNA in nucleosomes, or the recognition of particular DNA loci by restriction enzymes, repressors and other control proteins. the binding of the catabolite gene activator protein and of the TATA-box recognition protein to a double DNA helix are only two spectacular examples in which major bends in the helix are induced at specific sequence loci. It is of interest to consider whether the particular recognition sequences are bent even in the absence of proteins: a preformed bend in the DNA would form a custom site for protein binding, or an enhanced bendability of a given sequence would facilitate protein-induced bending.

Two possible models of sequence-dependent bending in free DNA have been proposed in the past. Nearest neighbor models propose that large-scale measurable curvature may arise by the accumulation of many small local deformations in helical twist, roll, tilt and slide at individual steps between base pairs. junction models, on the other hand, propose that bending occurs at the interface between two different structural variants of the B-DNA double helix. Note that in both of these models, sequences which are anisotropically bendable - for instance, sequences with steps that preferentially bend only to compress the major groove - will lead to an average structure which is similar to a sequence with a rigid, intrinsic bend. The Goodsell & Dickerson paper does not distinguish between these two possibilities.

B-DNA has the special property of having its base pairs very nearly perpendicular to the overall helix axis. Hence the normal vector to each base pair can be taken as representing the local helix at that point, and curvature and bending can be studied simply by observing the behaviour of the normal vectors from one base to another along the helix. This is both easy to calculate and simple to interpret. This program display the magnitude of bending and curvature at each point along the sequence. It is not intended as a substitute for more elaborate three-dimensional trajectory calculations, but only to express bending tendencies as a function of sequence. The power of this simple appraoch is in its ease of screening for regions of a given DNA sequence where phased local bends add constructively to form an overall curve.

For purposes of clarity the terms bending and curvature will be used in a restricted sense here. Bending of DNA describes the tendency for successive base pairs to be non-parallel in an additive manner over several base pair steps. Bending most commonly is produced by a rolling of adjacent base pairs over one another about thir long axis, although in principle, tilting of base pairs about their short axis could make a contribution. In contrast curvature of DNA represents the tendency of the helix axis to follow a non-linear pathway over an appreciable length, in a manner that contributes to macroscopic behaviour such as gel retardation or ease of cyclization into DNA minicircles. The distinction between local bending and macroscopic curvature is illustrated (poorly) in the following figure (see figure 1 of the Goodsell & Dickerson paper for a better view).

                       bend   bend   bend
                         -     -     -
  uncurved              / \   / \   / \
                  -----/   \-/   \-/   \-----
                          bend   bend
                  


                      
                    bend    bend
                     /-------\
                   /          \
  curved          |bend        |bend
                  |            |
                  |            |


An x-ray crystal structure analysis cannot show curvature, but can and often does show local bending. On the other hand gel electrophoresis and cyclization kinetics can detect macroscopic curvature, but not bending. A complete knowledge of local bending would permit the precise calculation of curvature, but a knowledge of macroscopic curvature alone does not allow one to specify precisely the local bending elements that produce it. This is one of the scale paradoxes that have plagued the DNA conformation field for a decade or more. There is more than a passing resemblence to a familiar problem of classical statistical mechanics: A complete knowledge of instantaneous positions and velocities of all molecules of a gas allows one to calculate bulk properties such as temprature, pressure and volume. But the most detailed knowledge of bulk properties cannot lead one to precise molecular positions. Many molecular arrangements can produce identical bulk properties, and in the present case, many bending combinations can produce identical macroscopic curvature.

Method

The program reads a sequence and a matrix of standard twist, roll and tilt angles for each type of base pair step. This matrix is entirely at the disposal of the user, and can be altered to represent any other DNA-bending model. The program creates a table or a graphical image of the bending and the curvature at each base step.

The program begins by applying the indicated twist, roll and tilt at each step along the sequence, and calculating the resulting base pair normal vector. The first base pair is aligned normal to the z axis, with a twist value of 0.0 degrees. the specified twist is applied to the second base pair, and roll and tilt values are use dto calculate its normal vector relative to the first. If either roll or tilt is non-zero, the new normal vector will be angled away from the z axis, producing the first 'bend'. the process is continued along the sequence, applying the appropriate twist, roll and tilt to each new base pair relative to its predecessor. The result is a list of normal vectors for all base pairs in the sequence.

Local bends are then calculated from the normal vectors. The bend for base N is calculated across a window from N-1 to N+1.

Curvature is calculated in two steps. Base pair normals are first averaged over a 10-base-pair window to filter out the local writhing of the helix. The normals of the nine base pairs from N-4 to N+4, and the two base pairs N-5 and N+5 at half weight, are averaged and assigned to base pair N. Curvature then is calculated from these averaged normal vector values, using a bracket value, nc, with a value of 15. That is, the curvature at base pair N is the angle between averaged normal vectors at base pairs N-nc and N+nc.

Usage

Here is a sample session with banana


% banana -graph data 
Bending and curvature plot in B-DNA
Input sequence: tembl:rnu68037

Created banana.dat

Go to the input files for this example
Go to the output files for this example

Command line arguments

   Mandatory qualifiers (* if not always prompted):
  [-sequence]          sequence   Sequence USA
*  -graph              graph      Graph type

   Optional qualifiers:
   -anglesfile         datafile   angles file
   -residuesperline    integer    Number of residues to be displayed on each
                                  line
   -outfile            outfile    Output file name

   Advanced qualifiers:
   -data               boolean    Output as data

   General qualifiers:
  -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-sequence]
(Parameter 1)
Sequence USA Readable sequence Required
-graph Graph type EMBOSS has a list of known devices, including postscript, ps, hpgl, hp7470, hp7580, meta, colourps, cps, xwindows, x11, tektronics, tekt, tek4107t, tek, none, null, text, data, xterm, png EMBOSS_GRAPHICS value, or x11
Optional qualifiers Allowed values Default
-anglesfile angles file Data file Eangles_tri.dat
-residuesperline Number of residues to be displayed on each line Any integer value 50
-outfile Output file name Output file banana.profile
Advanced qualifiers Allowed values Default
-data Output as data Boolean value Yes/No No

Input file format

Any DNA sequence USA.

Input files for usage example

'tembl:rnu68037' is a sequence entry in the example nucleic acid database 'tembl'

Database entry: tembl:rnu68037

ID   RNU68037   standard; RNA; ROD; 1218 BP.
XX
AC   U68037;
XX
SV   U68037.1
XX
DT   23-SEP-1996 (Rel. 49, Created)
DT   04-MAR-2000 (Rel. 63, Last updated, Version 2)
XX
DE   Rattus norvegicus EP1 prostanoid receptor mRNA, complete cds.
XX
KW   .
XX
OS   Rattus norvegicus (Norway rat)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
OC   Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Rattus.
XX
RN   [1]
RP   1-1218
RA   Abramovitz M., Boie Y.;
RT   "Cloning of the rat EP1 prostanoid receptor";
RL   Unpublished.
XX
RN   [2]
RP   1-1218
RA   Abramovitz M., Boie Y.;
RT   ;
RL   Submitted (26-AUG-1996) to the EMBL/GenBank/DDBJ databases.
RL   Biochemistry & Molecular Biology, Merck Frosst Center for Therapeutic
RL   Research, P. O. Box 1005, Pointe Claire - Dorval, Quebec H9R 4P8, Canada
XX
DR   SWISS-PROT; P70597; PE21_RAT.
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..1218
FT                   /db_xref="taxon:10116"
FT                   /organism="Rattus norvegicus"
FT                   /strain="Sprague-Dawley"
FT   CDS             1..1218
FT                   /codon_start=1
FT                   /db_xref="SWISS-PROT:P70597"
FT                   /note="family 1 G-protein coupled receptor"
FT                   /product="EP1 prostanoid receptor"
FT                   /protein_id="AAB07735.1"
FT                   /translation="MSPYGLNLSLVDEATTCVTPRVPNTSVVLPTGGNGTSPALPIFSM
FT                   TLGAVSNVLALALLAQVAGRLRRRRSTATFLLFVASLLAIDLAGHVIPGALVLRLYTAG
FT                   RAPAGGACHFLGGCMVFFGLCPLLLGCGMAVERCVGVTQPLIHAARVSVARARLALALL
FT                   AAMALAVALLPLVHVGHYELQYPGTWCFISLGPPGGWRQALLAGLFAGLGLAALLAALV
FT                   CNTLSGLALLRARWRRRRSRRFRENAGPDDRRRWGSRGLRLASASSASSITSTTAALRS
FT                   SRGGGSARRVHAHDVEMVGQLVGIMVVSCICWSPLLVLVVLAIGGWNSNSLQRPLFLAV
FT                   RLASWNQILDPWVYILLRQAMLRQLLRLLPLRVSAKGGPTELSLTKSAWEASSLRSSRH
FT                   SGFSHL"
XX
SQ   Sequence 1218 BP; 162 A; 397 C; 387 G; 272 T; 0 other;
     atgagcccct acgggcttaa cctgagccta gtggatgagg caacaacgtg tgtaacaccc        60
     agggtcccca atacatctgt ggtgctgcca acaggcggta acggcacatc accagcgctg       120
     cctatcttct ccatgacgct gggtgctgtg tccaacgtgc tggcgctggc gctgctggcc       180
     caggttgcag gcagactgcg gcgccgccgc tcgactgcca ccttcctgtt gttcgtcgcc       240
     agcctgcttg ccatcgacct agcaggccat gtgatcccgg gcgccttggt gcttcgcctg       300
     tatactgcag gacgtgcgcc cgctggcggg gcctgtcatt tcctgggcgg ctgtatggtc       360
     ttctttggcc tgtgcccact tttgcttggc tgtggcatgg ccgtggagcg ctgcgtgggt       420
     gtcacgcagc cgctgatcca cgcggcgcgc gtgtccgtag cccgcgcacg cctggcacta       480
     gccctgctgg ccgccatggc tttggcagtg gcgctgctgc cactagtgca cgtgggtcac       540
     tacgagctac agtaccctgg cacttggtgt ttcattagcc ttgggcctcc tggaggttgg       600
     cgccaggcgt tgcttgcggg cctcttcgcc ggccttggcc tggctgcgct ccttgccgca       660
     ctagtgtgta atacgctcag cggcctggcg ctccttcgtg cccgctggag gcggcgtcgc       720
     tctcgacgtt tccgagagaa cgcaggtccc gatgatcgcc ggcgctgggg gtcccgtgga       780
     ctccgcttgg cctccgcctc gtctgcgtca tccatcactt caaccacagc tgccctccgc       840
     agctctcggg gaggcggctc cgcgcgcagg gttcacgcac acgacgtgga aatggtgggc       900
     cagctcgtgg gcatcatggt ggtgtcgtgc atctgctgga gccccctgct ggtattggtg       960
     gtgttggcca tcgggggctg gaactctaac tccctgcagc ggccgctctt tctggctgta      1020
     cgcctcgcgt cgtggaacca gatcctggac ccatgggtgt acatcctgct gcgccaggct      1080
     atgctgcgcc aacttcttcg cctcctaccc ctgagggtta gtgccaaggg tggtccaacg      1140
     gagctgagcc taaccaagag tgcctgggag gccagttcac tgcgtagctc ccggcacagt      1200
     ggcttcagcc acttgtga                                                    1218
//

Output file format

The output is to both a graphical display and to a text file with the default name 'banana.profile'.

The graphical display shows the sequence together with the local local bending (solid line) and macroscopic curvature (dotted line).

Output files for usage example

File: banana.profile

Base   Bend      Curve
a       0.0      0.0
t      19.7      0.0
g      17.7      0.0
a      21.1      0.0
g      28.5      0.0
c      26.2      0.0
c      19.7      0.0
c      18.7      0.0
c      12.5      0.0
t       9.7      0.0
a      14.9      0.0
c      16.5      0.0
g      17.5      0.0
g      26.2      0.0
g      28.5      0.0
c      20.7      0.0
t      11.7      0.0
t       6.4      0.0
a       9.3      0.0
a      14.9      0.0
c      17.7     20.0
c      15.7     19.2
t      15.7     18.5
g      17.7     17.9
a      21.1     17.1
g      28.5     15.9
c      25.2     14.6
c      12.5     13.3
t       7.2     11.9
a      13.2     10.8
g      20.1     10.1
t      19.5      9.6
g      15.1      9.2
g      14.9      9.1
a      19.5      9.5
t      19.7     10.2
g      17.7     10.8
a      17.7     11.0
g      25.2     11.2
g      26.2     11.3
c      15.3     11.5
a      11.4     11.7
a      14.5     12.0
c      13.9     12.2
a      11.4     12.3
a      14.9     12.5
c      17.7     12.8
g      19.5     13.3
t      19.1     13.5


  [Part of this file has been deleted for brevity]

g      15.1     15.2
a      17.7     15.5
g      25.2     15.8
g      32.5     16.0
c      25.2     15.8
c      15.7     15.0
a      16.3     14.2
g      15.5     13.5
t      10.8     12.8
t      13.7     12.3
c      19.5     12.1
a      20.1     12.1
c      16.3     12.1
t      16.7     11.9
g      22.1     11.4
c      21.1     11.1
g      14.9     10.7
t       9.7     10.3
a      16.1      9.8
g      24.5      9.4
c      21.1      8.9
t      15.1      8.4
c      16.1      7.7
c      17.5      7.3
c      15.3      6.9
g      24.0      6.4
g      26.2      5.8
c      20.5      5.4
a      19.1      5.1
c      15.3     26.0
a      16.3      0.0
g      20.1      0.0
t      19.5      0.0
g      25.2      0.0
g      28.5      0.0
c      20.7      0.0
t      13.3      0.0
t      13.7      0.0
c      15.7      0.0
a      19.1      0.0
g      28.5      0.0
c      25.2      0.0
c      19.5      0.0
a      20.1      0.0
c      17.9      0.0
t      13.9      0.0
t      13.9      0.0
g      19.1      0.0
t      19.5      0.0
g       0.0      0.0
a       0.0      0.0

The data file consists of three columns separated by blanks or tab characters.

The first column is the sequence.
The second column is the local bending.
The third is the curvature.

Data files

It reads in angles files for the twist, roll and tilt angles. By default Eangles_tri.dat is used, as in Goodsell & Dickerson, NAR 1994 11;22(24):5497-503 and Drew and Travers (1986) JMB 191, 659

The description of this bending model is as follows:

The roll-tilt-twist parameters of this model are derived purely from experimental observations of sequence location preferences of base trimers in small circles of DNA, without reference to solution techniques that measure curvature per se. For this reason, they may be the most objective and unbiased parameters of all. Satchwell, Drew and Travers studied the positioning of DNA sequences wrappped around nucleosome cores, and in closed circles of double-helical DNA of comparable size. From the sequence data they calculated a fractional preference of each base pair triplet for a position 'facing out', or with the major groove on the concave side of the curved helix. The sequence GGC, for example, has a 45% preference for locations on a bent double helix in which its major groove faces inward and is compressed by the curvature (tending towards positive roll), whereas sequence AAA has a 36% preference for the opposite orientation, with major groove facing outward and with minor groove facing inward and compressed (tending toward negative roll). These fractional variances have been converted into roll angles in the following manner: Because x-ray cyrstal structure analysis uniformly indicates that AA steps are unbent, a zero roll is assigned to the AAA triplet; an arbitrary maximum roll of 10 degrees is asigned to GGC, and all other triplets are scaled in a lenear manner. Where % is the percent-out figure, then:

         Roll = 10 degrees * (% + 36)/(45 + 36)

Chenging the maximum roll value will scale the entire profile up or down proportionately, but will not change the shape of the profile. Peaks will remain peaks, and valleys, valleys. The absolute magnitide of all the roll values is less important than their relative magnitude, or the order of roll preference. Twist angles were set to zero. Because these values correspond to base trimers, the values of roll, tilt and twist were applied to the first two bases for the calculation.

Notes

None.

References

  1. Goodsell, D.S. & Dickerson, R.E. (1994) "Bending and Curvature Calculations in B-DNA" Nucl. Acids. Res. 22, 5497-5503.
  2. Drew and Travers (1986) JMB 191, 659

Warnings

Only ACTG allowed, if sequence contains a non ACTG character then the program will exit with a fatal error message.

Diagnostic Error Messages

None.

Exit status

0 if successful.

Known bugs

None.

See also

Program nameDescription
btwistedCalculates the twisting in a B-DNA sequence
chaosCreate a chaos game representation plot for a sequence
compseqCounts the composition of dimer/trimer/etc words in a sequence
danCalculates DNA RNA/DNA melting temperature
freakResidue/base frequency table or plot
isochorePlots isochores in large DNA sequences
sirnaFinds siRNA duplexes in mRNA
wordcountCounts words of a specified size in a DNA sequence

Author(s)

This application was written by Ian Longden (il@sanger.ac.uk) Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.

History

The original program ('BEND') is described in the Goodsell & Dickerson paper.

Created 1999/06/09.
Last Updated 1999/06/14.

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments