| Home |
Atlases of oligonucleotide parameters
Images are created by a Java program JCircleGraph with data from two programs, OligoWords and OligoCounter.
Interpreting an image
The dark grey inner ring is the scale, with ticks every 0.25MB or 0.5MB depending on genome size.
The four innermost rings are mononucleotide or tetranucleotide (4mer) parameters derived from a 10kb sliding window using the Python program OligoWords (see the downloads page on this server, or Reva and Tümmler 2004 for the program and a full description of parameters).
1: GC content (proportion of G and C in one window)
Example: a single 14mer in a 5000bp region. (14 / 5000) * 100 = 0.28% occupancy.
Example: 2 half-overlapping 8mers in a 5000bp region. ( 12 / 5000) * 100 = 0.24% occupancy. Percentage occupancy reduces the amount of redundancy in a dataset with overlapping repeats. One characteristic of OligoCounter is overlapping repeats since it uses a window size of 1bp.
5: Percentage occupancy of the 5kbp regions bases by overrepresented 8-14bp oligos at default chi-sq. level 500
Filtering: Should an inappropriately high chi squared value have been selected for rings 5 or 6, few or no data will be available resulting in a dark blue ring which dominates the rest of the graph. To avoid this, where the for the ring the average minus one standard deviation is less than zero (which is the case when few data are available), the whole ring is filtered and left grey. The solution is to use a lower chi squared value (i.e. rerun OligoCounter).
The correlation class circle, ring 7, indicates the differences between tetranucleotides - in this case oligonucleotide variance in ring 4 - and the innermost 8-14mer percentage occupancy in ring 5.
7: 4mer-8mer correlation class derived from rings 4 (OUV, 4mer) and 5(% occupancy, 8-14mers)
Colours range from dark blue (below average) through light grey(average) to dark red (above average). These colours cover 3 standard deviations above and 3 below the average, and thus over 99% of normally distributed data. Extremes that do not lie within 3 standard deviations so are coloured more emphatically. By using this colour dimension, regions of the genome which are divergent in various parameters from average can be clearly seen. These may be genome islands, integrated phages, horizontally transferred genomic elements, rDNA or repeat regions.
How to create JCircleGraph images
The following list of files are needed to create a circle graph
Four output files from the Python program OligoWords
Two output files from the Java program OligoCounter.
Obtain fasta files (.fna) from the NCBI RefSeq collection.
java -Xmx500m -jar JCircleGraph.jar
You can run the jar file by double clicking on it on some Windows PCs, however the memory assigned by default is not sufficient to run the program properly. Therefore it is best to run it from the command line with the -Xmx500m switch to make 500 megabytes of additional memory available.
JCircleGraph requires all 4 tetranucleotide parameters from OligoWords, else will refuse to work. A dropdown list of available genome RefSeqs are currently derived from the resultsPositions files from OligoCounter, so you need to have these files in the same directory.
How to create the OligoWords output files
Obtain the Python command line program OligoWords (see Reva and Tümmler 2004).
Install the Python programming language if necessary
at the command line i.e. DOS prompt or shell, feedback such as this:
"Python 2.4.3 (#1, Oct 23 2006, 14:19:47)"
indicates python is installed.
Add all files to the working directory
Run OligoWords from the command line four times with the four necessary parameters as below.
python OligoWords1.2.exe.py task=n0_4mer:XXX, frame=10000, step=5000
Where XXX = GC, D, PS, V for each respective run
After each run manually add the parameter to the end of the filename, else it will be overwritten by the next run.
NC_006156.out -> NC_006156gc.out,
Move all these created files into the working directory or .......
Alternatively, if you are analysing many genomes and do not want to have to rename each file individually:
create the following directories in the directory where you plan to use JCircleGraph, and insert the raw output files from OligoWords into the relevant directory (take care not to mix them up though !).
gc, d, ps, ouv
How to create the OligoCounter output files
Run OligoCounter (see tutorial elsewhere on this site) with chi squared thresholds 500, 1200.
java -Xmx1500m -jar OligoCounter.jar
Move resultsPositionsNC_xxxxxx_chisq.txt data files to the JCircleGraph directory.