samtools-import - converts FASTQ files to unmapped SAM/BAM/CRAM
samtools import [
options] [
fastq_file ... ]
Reads one or more FASTQ files and converts them to unmapped SAM, BAM or CRAM.
The input files may be automatically decompressed if they have a .gz
extension.
The simplest usage in the absence of any other command line options is to
provide one or two input files.
If a single file is given, it will be interpreted as a single-ended sequencing
format unless the read names end with /1 and /2 in which case they will be
labelled as PAIRED with READ1 or READ2 BAM flags set. If a pair of filenames
are given they will be read from alternately to produce an interleaved output
file, also setting PAIRED and READ1 / READ2 flags.
The filenames may be explicitly labelled using
-1 and
-2 for READ1
and READ2 data files,
-s for an interleaved paired file (or one half of
a paired-end run),
-0 for unpaired data and explicit index files
specified with
--i1 and
--i2. These correspond to typical output
produced by Illumina bcl2fastq and match the output from
samtools
fastq. The index files will set both the
BC barcode code and it's
associated
QT quality tag.
The Illumina CASAVA identifiers may also be processed when the
-i option
is given. This tag will be processed for READ1 / READ2, whether or not the
read failed processing (QCFAIL flag), and the barcode sequence which will be
added to the
BC tag. This can be an alternative to explicitly
specifying the index files, although note that doing so will not fill out the
barcode quality tag.
-
-s FILE
- Import paired interleaved data from FILE.
-
-0 FILE
- Import single-ended (unpaired) data from FILE.
Operationally there is no difference between the -s and -0
options as given an interleaved file with /1 and /2 read name endings both
will correctly set the PAIRED, READ1 and READ2 flags, and given data with
no suffixes and no CASAVA identifiers being processed both will leave the
data as unpaired. However their inclusion here is for more descriptive
command lines and to improve the header comment describing the samtools
fastq decode command.
-
-1 FILE, -2 FILE
- Import paired data from a pair of FILEs. The BAM flag
PAIRED will be set, but not PROPER_PAIR as it has not been aligned. READ1
and READ2 will be stored in their original, unmapped, orientation.
-
--i1 FILE, --i2 FILE
- Specifies index barcodes associated with the -1 and
-2 files. These will be appended to READ1 and READ2 records in the
barcode (BC) and quality (QT) tags.
- -i
- Specifies that the Illumina CASAVA identifiers should be
processed. This may set the READ1, READ2 and QCFAIL flags and add a
barcode tag.
- -N, --name2
- Assume the read names are encoded in the SRA and ENA
formats where the first word is an automatically generated name with the
second field being the original name. This option extracts that second
field instead.
- --barcode-tag TAG
- Changes the auxiliary tag used for barcode sequence.
Defaults to BC.
- --quality-tag TAG
- Changes the auxiliary tag used for barcode quality.
Defaults to QT.
-
-oFILE
- Output to FILE. By default output will be written to
stdout.
-
--order TAG
- When outputting a SAM record, also output an integer tag
containing the Nth record number. This may be useful if the data is to be
sorted or collated in some manner and we wish this to be reversible. In
this case the tag may be used with samtools sort -t TAG to
regenerate the original input order.
-
-r RG_line, --rg-line RG_line
- A complete @RG header line may be specified, with or
without the initial "@RG" component. If specified this will also
use the ID field from RG_line in each SAM records RG auxiliary tag.
If specified multiple times this appends to the RG line, automatically
adding tabs between invocations.
-
-R RG_ID, --rg RG_ID
- This is a shorter form of the option above, equivalent to
--rg-line ID:RG_ID. If both are specified then this option
is ignored.
- -u
- Output BAM or CRAM as uncompressed data.
-
-T TAGLIST
- This looks for any SAM-format auxiliary tags in the comment
field of a fastq read name. These must match the
<alpha-num><alpha-num>:<type>:<data> pattern as
specified in the SAM specification. TAGLIST can be blank or
* to indicate all tags should be copied to the output, otherwise it
is a comma-separated list of tag types to include with all others being
discarded.
Convert a single-ended fastq file to an unmapped CRAM. Both of these commands
perform the same action.
samtools import -0 in.fastq -o out.cram
samtools import in.fastq > out.cram
Convert a pair of Illumina fastqs containing CASAVA identifiers to BAM, adding
the barcode information to the BC auxiliary tag.
samtools import -i -1 in_1.fastq -2 in_2.fastq -o out.bam
samtools import -i in_[12].fastq > out.bam
Specify the read group. These commands are equivalent
samtools import -r "$(echo -e 'ID:xyz\tPL:ILLUMINA')" in.fq
samtools import -r "$(echo -e '@RG\tID:xyz\tPL:ILLUMINA')" in.fq
samtools import -r ID:xyz -r PL:ILLUMINA in.fq
Create an unmapped BAM file from a set of 4 Illumina fastqs from bcf2fastq,
consisting of two read and two index tags. The CASAVA identifier is used only
for setting QC pass / failure status.
samtools import -i -1 R1.fq -2 R2.fq --i1 I1.fq --i2 I2.fq -o out.bam
Convert a pair of CASAVA barcoded fastq files to unmapped CRAM with an
incremental record counter, then sort this by minimiser in order to reduce
file space. The reversal process is also shown using samtools sort and
samtools fastq.
samtools import -i in_1.fq in_2.fq --order ro -O bam,level=0 | \
samtools sort -@4 -M -o out.srt.cram -
samtools sort -@4 -O bam -u -t ro out.srt.cram | \
samtools fastq -1 out_1.fq -2 out_2.fq -i --index-format "i*i*"
Written by James Bonfield of the Wellcome Sanger Institute.
samtools(1),
samtools-fastq(1)
Samtools website: <
http://www.htslib.org/>