| Home |
Source filesOligoCounter has been developed and used extensively within our group with NCBI RefSeq files. These use the RefSeq (NC_000000) part of the fasta header to name output files, since these are unique numbers which are useful for web databases and referencing other NCBI RefSeq files. OligoCounter can be used at present with fasta files from other sources, but the RefSeq header should be faked -- see below. We will provide a more elegant solution for this problem using GI numbers (Genbank identifier) in the future, but the problem is these cannot be as easily referenced to GFF files containing coding data (which can be used for OligoViz) from the RefSeq collection.
Using non-NCBI RefSeq source filesRefSeq accession numbers are parsed from the fasta file and used. The file must contain a RefSeq header even if it is a non-RefSeq genome. These can simply be faked by copying a RefSeq complete fasta header and modifying it. Alternatively, if OligoCounter does not find a RefSeq accession it will generate a fake one.
>gi|26986745|ref|NC_002947.3| Pseudomonas putida KT2440, complete genome
Changes - the NC_XXXXXX is the key part, but also the | characters need to be left in place
>gi|0000000|ref|NC_111111.1| My genome description
Using NCBI RefSeq files
Get source files from the NCBI ftp site at the following link
OligoCounter requires at least one fasta file in .fna format (fasta). We have only tested the program on prokaryotic genomes less than 10MB in size but larger genomes can be analysed if more RAM is attributed (we suggest 2000MB RAM). We have successfully analysed 20MB of human DNA.
Necessary input files:
.fna (pure fasta)
Recommended extra files:
GFF files provide coding coordinates required to create multi colour images from results with further programs.
.gff (coding coordinates)
If you want to run analyses for all genomes, it is advisable to download zip files (tar.gz) of all currently publicly available genomes from the NCBI ftp.