cleanasn - clean up irregularities in NCBI ASN.1 objects
cleanasn [
-] [
-A filename] [
-B str] [
-C str] [
-D str] [
-F str] [
-K str] [
-L filename] [
-M filename] [
-N str] [
-O str] [
-P str] [
-Q str] [
-R] [
-S str] [
-T] [
-U str] [
-V str] [
-X str] [
-Z str] [
-a str] [
-b] [
-c] [
-d str] [
-f str] [
-i filename] [
-j filename] [
-k filename] [
-m str] [
-n path] [
-o filename] [
-p path] [
-q path] [
-r path] [
-v path] [
-x ext]
cleanasn is a utility program to clean up irregularities in NCBI ASN.1
objects.
A summary of options is included below.
- -
- Print usage message
-
-A filename
- Accession list file
-
-B str
- Branch, per the flags in str:
- c
- Has coding regions
- d
- No coding regions
- p
- Passes validation
- q
- Validator errors or rejects
- r
- Only pop/phy/mut/eco/WGS sets
- s
- Exclude pop/phy/mut/eco/WGS sets
- t
- Only nuc-prot sets
- u
- Exclude nuc-prot sets
- v
- Only segmented sequences
- w
- Exclude segmented sequences
- x
- Only segmented proteins
- y
- Exclude segmented proteins
-
-C str
- Sequence operations, per the flags in str:
- c
- Compress
- d
- Decompress
- l
- Recalculated segmented sequence length
- v
- Virtual gaps inside segmented sequence
- s
- Convert segmented set to delta sequence
- t
- Non-NucProt segmented set to delta sequence
- u
- Improved non-NucProt segmented set to delta sequence
- g
- Raw to delta by assembly gap
- m
- Merge assembly gap features
-
-D str
- Clean up descriptors, per the flags in str:
- t
- Remove Title
- c
- Remove Comment
- n
- Remove Nuc-Prot Set title
- e
- Remove Pop/Phy/Mut/Eco Set title
- m
- Remove mRNA title
- p
- Remove Protein title
- a
- Title to name
- b
- AutoDef title or name
- x
- Prefix title with organism name
-
-F str
- Clean up features, per the flags in str:
- u
- Remove User-objects
- d
- Remove db_xrefs
- e
- Remove /evidence and /inference
- g
- Fuse multi-interval genes
- i
- Fuse adjacent-interval imported features
- r
- Remove redundant gene xrefs
- f
- Fuse duplicate features
- s
- Package features on referenced Bioseq
- k
- Package coding-region or parts features
- z
- Delete or update EC numbers
- b
- Set Best coding-region reading frame
- x
- Retranslate coding regions
- a
- Adjust for missing stop codon
-
-K str
- Perform a general cleanup, per the flags in str:
- b
- BasicSeqEntryCleanup
- p
- C++ BasicCleanup (via an external utility)
- v
- AdvancedSeqEntryCleanup
- s
- SeriousSeqEntryCleanup
- x
- ExtendedSeqEntryCleanup
- g
- GpipeSeqEntryCleanup
- n
- Normalize descriptor order
- u
- Remove NcbiCleanup User Objects
- c
- Synchronize genetic Codes
- f
- CDS partial from translation
- e
- Impose CDS partials
- d
- Resynchronize CDS partials
- m
- Resynchronize mRNA partials
- t
- Resynchronize Peptide partials
- a
- Adjust consensus splice
- i
- Promote to "worst" Seq-ID
- r
- Reassign local IDs
- l
- Remove locus
-
-L filename
- Log file
-
-M filename
- Macro file
-
-N str
- Clean up links, per the flags in str:
- o
- Link CDS mRNA by Overlap
- p
- Link CDS mRNA by Product
- l
- Link CDS mRNA by Label and Location
- r
- Reassign feature IDs
- m
- Merge colliding feature IDs
- f
- Fix missing reciprocal feature IDs
- c
- Clear feature IDs
-
-O str
- Missing prot-ref name
-
-P str
- Publication options:
- a
- Remove All publications
- s
- Remove Serial number
- f
- Remove Figure, numbering, and name
- r
- Remove Remark
- u
- Update PMID-only publication
- j
- Lookup ISO Journal title abbreviation
- m
- Merge identical publication features
- #
- Replace unpublished with PMID
-
-Q str
- Report:
- c
- Record count
- r
- ASN.1 BSEC report
- s
- ASN.1 SSEC report
- n
- NORM vs. SSEC report
- e
- PopPhyMutEco AutoDef report
- o
- Overlap report
- l
- Latitude-longitude country diff
- d
- Log SSEC differences
- g
- GenBank SSEC diff
- f
- asn2gb/asn2flat diff
- h
- Seg-to-delta GenBank diff
- v
- Validator SSEC diff
- m
- Modernize Gene/RNA/PCR
- u
- Unpublished Pub lookup
- p
- Published Pub lookup
- j
- Unindexed Journal report
- t
- tRNA anticodon report
- w
- Component offset report
- x
- Custom scan
- -R
- Remote fetching from ID (NCBI sequence databases)
-
-S str
- Selective difference filter (capital letters skip)
- s
- SSEC
- b
- BSEC
- A
- Author
- p
- Publication
- l
- Location
- r
- RNA
- q
- Qualifier sort order
- g
- Genbank block
- k
- Package CdRegion or parts features
- m
- Move publication
- o
- Leave duplicate Bioseq publication
- d
- Automatic definition line
- e
- Pop/Phy/Mut/Eco Set definition line
- -T
- Taxonomy Lookup
-
-U str
- Modernize, per the flags in str:
- g
- Genes
- r
- RNA
- p
- PCR Primers
-
-V str
- Remove features by validator severity:
- r
- Reject
- e
- Error
- w
- Warning
- i
- Info
-
-X str
- Miscellaneous options, per str:
- d
- Automatic definition line
- s
- Automatic definition line with Source qualifiers
- e
- Pop/Phy/Mut/Eco Set definition line
- n
- Instantiate NC title
- m
- Instantiate NM titles
- x
- Special XM titles
- p
- Instantiate Protein titles
- g
- GPipe instantiate titles
- c
- Create mRNAs for coding sequences
- f
- Fix reciprocal protein_id/transcript_id
- v
- Revert preRNA or ncRNA transcript_id
- t
- Parse anticodon from Sequence
- b
- Batch cleanup of multireader output
- z
- Wrap SegSet with NucProt set
- w
- GFF/WGS genome cleanup
-
-Z str
- Remove indicated User-object
-
-a str
- ASN.1 type
- a
- Any (default)
- e
- Seq-entry
- b
- Bioseq
- s
- Bioseq-set
- m
- Seq-submit
- t
- Batch Bioseq-set
- u
- Batch Seq-submit
- -b
- Input ASN.1 is Binary
- -c
- Input ASN.1 is Compressed
-
-d str
- Source database
- a
- Any (default)
- g
- GenBank
- e
- EMBL
- d
- DDBJ
- b
- EMBL or DDBJ
- i
- INSD
- r
- RefSeq
- n
- NCBI
- x
- Exclude EMBL/DDBJ
- y
- Exclude gbcon, gbest, gbgss, gbhtg, gbpat, gbsts
-
-f str
- Substring filter
-
-i filename
- Single input file (defaults to stdin)
-
-j filename
- First filename
-
-k filename
- Last filename
-
-m str
- Flatfile mode:
- r
- Release
- e
- Entrez
- s
- Sequin
- d
- Dump
-
-n path
-
asn2flat executable (default is
/netopt/ncbi_tools/bin/asn2flat)
-
-o filename
- Single output file (defaults to stdout)
-
-p path
- Process all matching files in path
-
-q path
-
ffdiff executable (default is
/netopt/genbank/subtool/bin/ffdiff)
-
-r path
- Path for results
-
-v path
-
asnval executable (default is
/netopt/ncbi_tools/bin/asnval)
-
-x ext
- File selection suffix for use with -p (defaults to
.ent)
The National Center for Biotechnology Information.
asndisc(1),
asnval(1),
sequin(1).