BamFile {Rsamtools}R Documentation

Maintain SAM and BAM files

Description

Use BamFile() to create a reference to a BAM file (and optionally its index). The reference remains open across calls to methods, avoiding costly index re-loading.

BamFileList() provides a convenient way of managing a list of BamFile instances.

Usage


## Constructors

BamFile(file, index=file)
BamFileList(...)

## Opening / closing

## S3 method for class 'BamFile'
open(con, ...)
## S3 method for class 'BamFile'
close(con, ...)

## accessors; also path(), index()

## S4 method for signature 'BamFile'
isOpen(con, rw="")

## actions

## S4 method for signature 'BamFile'
scanBamHeader(files, ...)
## S4 method for signature 'BamFile'
seqinfo(x)
## S4 method for signature 'BamFile'
scanBam(file, index=file, ..., param=ScanBamParam(what=scanBamWhat()))
## S4 method for signature 'BamFile'
countBam(file, index=file, ..., param=ScanBamParam())
## S4 method for signature 'BamFileList'
countBam(file, index=file, ..., param=ScanBamParam())
## S4 method for signature 'BamFile'
filterBam(file, destination, index=file, ...,
    indexDestination=TRUE, param=ScanBamParam(what=scanBamWhat()))
## S4 method for signature 'BamFile'
indexBam(files, ...)
## S4 method for signature 'BamFile'
sortBam(file, destination, ..., byQname=FALSE, maxMemory=512)
## S4 method for signature 'BamFile'
readBamGappedAlignments(file, index=file, use.names=FALSE, param=NULL)
## S4 method for signature 'BamFile'
readBamGappedReads(file, index=file, use.names=FALSE, param=NULL)

## counting
## S4 method for signature 'GRanges,BamFileList'
summarizeOverlaps(
    features, reads, mode, ignore.strand = FALSE, ..., param = ScanBamParam()) 

Arguments

...

Additional arguments. For BamFileList, this can either be a single character vector of paths to BAM files, or several instances of BamFile objects.

con

An instance of BamFile.

x, file, files

A character vector of BAM file paths (for BamFile) or a BamFile instance (for other methods).

index

A character vector of indices (for BamFile); ignored for all other methods on this page.

destination

character(1) file path to write filtered reads to.

indexDestination

logical(1) indicating whether the destination file should also be indexed.

byQname, maxMemory

See sortBam.

param

An optional ScanBamParam instance to further influence scanning, counting, or filtering.

use.names

Construct the names of the returned object from the query template names (QNAME field)? If not (the default), then the returned object has no names.

rw

Mode of file; ignored.

reads

A BamFileList that represents the data to be counted by summarizeOverlaps.

features

A GRanges or a GRangesList object of genomic regions of interest. When a GRanges is supplied, each row is considered a feature. When a GRangesList is supplied, each higher list-level is considered a feature. This distinction is important when defining an overlap between a read and a feature. See examples for details.

mode

A function that defines the method to be used when a read overlaps more than one feature. Pre-defined options are "Union", "IntersectionStrict", or "IntersectionNotEmpty" and are designed after the counting modes available in the HTSeq package by Simon Anders (see references).

  • "Union" : (Default) Reads that overlap any portion of exactly one feature are counted. Reads that overlap multiple features are discarded.

  • "IntersectionStrict" : A read must fall completely "within" the feature to be counted. If a read overlaps multiple features but falls "within" only one, the read is counted for that feature. If the read is "within" multiple features, the read is discarded.

  • "IntersectionNotEmpty" : A read must fall in a unique disjoint region of a feature to be counted. When a read overlaps multiple features, the features are partitioned into disjoint intervals. Regions that are shared between the features are discarded leaving only the unique disjoint regions. If the read overlaps one of these remaining regions, it is assigned to the feature the unique disjoint region came from.

ignore.strand

A logical value indicating if strand should be considered when matching.

Objects from the Class

Objects are created by calls of the form BamFile().

Fields

The BamFile class inherits fields from the RsamtoolsFile class.

Functions and methods

BamFileList inherits methods from RsamtoolsFileList and SimpleList.

Opening / closing:

open.BamFile

Opens the (local or remote) path and index (if bamIndex is not character(0)), files. Returns a BamFile instance.

close.BamFile

Closes the BamFile con; returning (invisibly) the updated BamFile. The instance may be re-opened with open.BamFile.

Accessors:

path

Returns a character(1) vector of BAM path names.

index

Returns a character(1) vector of BAM index path names.

Methods:

scanBamHeader

Visit the path in path(file), returning the information contained in the file header; see scanBamHeader.

seqinfo

Visit the path in path(file), returning a Seqinfo instance containing information on the lengths of each sequence.

scanBam

Visit the path in path(file), returning the result of scanBam applied to the specified path.

countBam

Visit the path(s) in path(file), returning the result of countBam applied to the specified path.

filterBam

Visit the path in path(file), returning the result of filterBam applied to the specified path.

indexBam

Visit the path in path(file), returning the result of indexBam applied to the specified path.

sortBam

Visit the path in path(file), returning the result of sortBam applied to the specified path.

readBamGappedAlignments, readBamGappedReads

Visit the path in path(file), returning the result of readBamGappedAlignments or readBamGappedReads applied to the specified path. See readBamGappedAlignments.

show

Compactly display the object.

Author(s)

Martin Morgan and Marc Carlson

See Also

The GenomicRanges package is where the summarizeOverlaps method originates.

Examples


fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
bf <- open(BamFile(fl))        # implicit index
bf
identical(scanBam(bf), scanBam(fl))

rng <- GRanges(c("seq1", "seq2"), IRanges(1, c(1575, 1584)))

## repeatedly visit 'bf'
sapply(seq_len(length(rng)), function(i, bamFile, rng) {
    param <- ScanBamParam(which=rng[i], what="seq")
    bam <- scanBam(bamFile, param=param)[[1]]
    alphabetFrequency(bam[["seq"]], baseOnly=TRUE, collapse=TRUE)
}, bf, rng)


##---------------------------------------------------------------------------##
## How to use summarizeOverlaps with a BamFileList object.
fls = list.files(system.file("extdata",package="GenomicRanges"),
                 recursive=TRUE, pattern="*bam$", full=TRUE)
bfs <- BamFileList(fls)

## "features" will be the argument for an "annotations" object (GRanges
## or GRangesList object.
group_id <- c("A", "B", "C", "C", "D", "D", "E", "F", "G", "H", "H")
features <- GRanges(
    seqnames = Rle(c("chr2L", "chr2R", "chr2L", "chr2R", "chr2L", "chr2R", 
        "chr2L", "chr2R", "chr2R", "chr3L", "chr3L")),
    strand = strand(rep("+", length(group_id))),
    ranges = IRanges(
        start=c(1000, 2000, 3000, 3600, 7000, 7500, 4000, 4000, 3000, 5000, 5400),
        width=c(500, 900, 500, 300, 600, 300, 500, 900, 500, 500, 500)),
   DataFrame(group_id)
)
## Then call the method:
summarizeOverlaps(features, bfs, mode = Union, ignore.strand=TRUE)



[Package Rsamtools version 1.6.3 Index]