BamInput {Rsamtools} | R Documentation |
Import binary ‘BAM’ files into a list structure, with facilities for selecting what fields and which records are imported.
scanBam(file, index=file, ..., param=ScanBamParam(what=scanBamWhat())) countBam(file, index=file, ..., param=ScanBamParam()) scanBamHeader(files, ...) ## S4 method for signature 'character' scanBamHeader(files, ...) asBam(file, destination, ...) ## S4 method for signature 'character' asBam(file, destination, ..., overwrite=FALSE, indexDestination=TRUE) filterBam(file, destination, index=file, ...) ## S4 method for signature 'character' filterBam(file, destination, index=file, ..., indexDestination=TRUE, param=ScanBamParam(what=scanBamWhat())) sortBam(file, destination, ...) ## S4 method for signature 'character' sortBam(file, destination, ..., byQname=FALSE, maxMemory=512) indexBam(files, ...) ## S4 method for signature 'character' indexBam(files, ...)
file |
The character(1) file name of the ‘BAM’ ('SAM' for
|
files |
The character() file names of the ‘BAM’ file to be processed. |
index |
The character(1) name of the index file of the 'BAM' file being processed; this is given without the '.bai' extension. |
destination |
The character(1) file name of the location where
the sorted or filtered output file will be created. For |
... |
Additional arguments, passed to methods. |
overwrite |
A logical(1) indicating whether the destination can be over-written if it already exists. |
indexDestination |
A logical(1) indicating whether the created destination file should also be indexed. |
byQname |
A logical(1) indicating whether the sorted destination file should be sorted by Query-name (TRUE) or by mapping position (FALSE). |
maxMemory |
A numerical(1) indicating the maximal amount of memory (in MB) that the function is allowed to use. |
param |
An instance of |
The scanBam
function parses binary BAM files; text SAM files
can be parsed using R's scan
function, especially with
arguments what
to control the fields that are parsed.
countBam
returns a count of records consistent with
param
.
scanBamHeader
visits the header information in a BAM file,
returning for each file a list containing elements targets
and
text
, as described below. The SAM / BAM specification does not
require that the content of the header be consistent with the content
of the file, e.g., more targets may be present that are represented by
reads in the file.
asBam
converts 'SAM' files to 'BAM' files, equivalent to the
samtools view -Sb file > destination
. The 'BAM' file is sorted
and an index created on the destination (with extension '.bai') when
indexDestination=TRUE
.
filterBam
parses records in file
satisfying the
bamWhich
of param
, writing each record satisfying the
bamFlag
and bamSimpleCigar
criteria of param
to
file destination
. An index file is created on the destination
when indexDestination=TRUE
.
sortBam
sorts the BAM file given as its first argument,
analogous to the “samtools sort” function.
indexBam
creates an index for each BAM file specified,
analogous to the ‘samtools index’ function.
Details of the ScanBamParam
class are provide on its help page;
several salient points are reiterated here. ScanBamParam
can
contain a field what
, specifying the components of the BAM
records to be returned. Valid values of what
are available with
scanBamWhat
. ScanBamParam
can contain an argument
which
that specifies a subset of reads to return. This requires
that the BAM file be indexed, and that the file be named following
samtools convention as <bam_filename>.bai
. ScanBamParam
can contain an argument tag
to specify which tags will be
extracted.
The scanBam,character-method
returns a list of lists. The outer
list groups results from each Ranges
list of
bamWhich(param)
; the outer list is of length one when
bamWhich(param)
has length 0. Each inner list contains elements
named after scanBamWhat()
; elements omitted from
bamWhat(param)
are removed. The content of non-null elements
are as follows, taken from the description in the samtools API
documentation:
qname: This is the QNAME field in SAM Spec v1.4. The query name, i.e., identifier, associated with the read.
flag: This is the FLAG field in SAM Spec v1.4.
A numeric value summarizing details of the read. See
ScanBamParam
and the flag
argument, and
scanBamFlag()
.
rname: This is the RNAME field in SAM Spec v1.4. The name of the reference to which the read is aligned.
strand: The strand to which the read is aligned.
pos: This is the POS field in SAM Spec v1.4.
The genomic coordinate at the start of the alignment.
Coordinates are ‘left-most’, i.e., at the 3' end of a
read on the '-' strand, and 1-based. The position excludes
clipped nucleotides, even though soft-clipped nucleotides are
included in seq
.
qwidth: The width of the query, as calculated from the
cigar
encoding; normally equal to the width of the query
returned in seq
.
mapq: This is the MAPQ field in SAM Spec v1.4. The MAPping Quality.
cigar: This is the CIGAR field in SAM Spec v1.4. The CIGAR string.
mrnm: This is the RNEXT field in SAM Spec v1.4. The reference to which the mate (of a paired end or mate pair read) aligns.
mpos: This is the PNEXT field in SAM Spec v1.4. The position to which the mate aligns.
isize: This is the TLEN field in SAM Spec v1.4. Inferred insert size for paired end alignments.
seq: This is the SEQ field in SAM Spec v1.4. The query sequence, in the 5' to 3' orientation. If aligned to the minus strand, it is the reverse complement of the original sequence.
qual: This is the QUAL field in SAM Spec v1.4.
Phred-encoded, phred-scaled base quality score, oriented as
seq
.
scanBamHeader
returns a list, with one element for each file
named in files
. The list contains two element. The
targets
element contains target (reference) sequence
lengths. The text
element is itself a list with each element a
list corresponding to tags (e.g., ‘@SQ’) found in the header,
and the associated tag values.
asBam
returns the file name of the BAM file.
sortBam
returns the file name of the sorted file.
indexBam
returns the file name of the index file created.
filterBam
returns the file name of the destination file
created.
Martin Morgan <mtmorgan@fhcrc.org>. Thomas Unterhiner
<thomas.unterthiner@students.jku.at> (sortBam
).
http://samtools.sourceforge.net/
ScanBamParam
, scanBamWhat
,
scanBamFlag
fl <- system.file("extdata", "ex1.bam", package="Rsamtools") res0 <- scanBam(fl)[[1]] # always list-of-lists names(res0) length(res0[["qname"]]) lapply(res0, head, 3) table(width(res0[["seq"]])) # query widths table(res0[["qwidth"]], useNA="always") # query widths derived from cigar table(res0[["cigar"]], useNA="always") table(res0[["strand"]], useNA="always") table(res0[["flag"]], useNA="always") which <- RangesList(seq1=IRanges(1000, 2000), seq2=IRanges(c(100, 1000), c(1000, 2000))) p1 <- ScanBamParam(which=which, what=scanBamWhat()) res1 <- scanBam(fl, param=p1) names(res1) names(res1[[2]]) p2 <- ScanBamParam(what=c("rname", "strand", "pos", "qwidth")) res2 <- scanBam(fl, param=p2) p3 <- ScanBamParam(flag=scanBamFlag(isMinusStrand=FALSE)) length(scanBam(fl, param=p3)[[1]]) sorted <- sortBam(fl, tempfile()) ## map values(which) to output, e.g., of countBam gwhich <- as(which, "GRanges")[c(2, 1, 3)] values(gwhich)[["OriginalOrder"]] <- 1:3 cnt <- countBam(fl, param=ScanBamParam(which=gwhich)) cntVals <- unlist(split(values(gwhich), seqnames(gwhich))) cbind(cnt, as.data.frame(cntVals))