NAME

coloredChromosomes.pl - A program to automize complex ideogram drawing


SYNOPSIS

coloredChromosomes.pl [--help] [--chromosomeSpec=path] [--placement=placement name] [--labelFile=labelFile] [--labelMap=path] [--labelPlacement=placement name] [--o=output path]


DESCRIPTION

coloredChromosomes.pl is a program designed to draw chromosomal ideograms. It reads a specification file as given by --chromosomeSpec which describes a chromosomal layout and banding pattern for some organism. The option --placement chooses a placement pattern for the chromosomes on the paper out of a set of patterns present in the configuration file. The contents of the configuration file is described below. --labelFile denotes the path of an additional file which holds annotation information to draw within or alongside the chromomal shapes. For one such labelFile the option --labelPlacement specifies a section in the main configuration file which gives further options to control annotation. If annotations are complex --labelMap enables a further level of indirection by representing chromosomal locations by name which can be referred upon in the labelMap.

The main specification file (MSF)

In this section the format of the specification file is explained. It is recommended to modify the supplied specification file rather than building one from scratch. Also you may find it helpful to change various different parameters to understand their effect, if in doubt. The main specification file is in PropertyList format (see PropertyList). A global, unnamed dictionary contains the list chromosomes and the dictionaries placements, labelPlacements, bandNamePlacements.

MSF: chromosomes

The entry chromosomes in the MSF is a list, each entry of which describes a single chromosome. Each entry is a dictionary containing the keys name, p, q, pBanding and qBanding. The value of name is a string which is used to refer to this chromosome. Also this string is printed under chromosome in the final program output. p is a number which specifies the length of the p-arm of this chromosomes in arbitrary units. q is the length of the q-arm. The units are arbitrary, but have to be consistent over all chromosomes such that relative sizes are properly drawn. pBanding and qBanding refer to lists each, which describe the banding patterns of the respective chromosomal arms. Each list entry is a dictionary containing the keys name, start, stop and color. The following example describes a single chromosome:

 {
        name = "1";
        p = "375.3212";
        q = "389.9321";
        pBanding = (
                { name = "11"; color = centromere;
                        stop = "0.0122"; start = "0.0000"; },
                { name = "12"; stop = "0.0365";
                        start = "0.0122"; color = black; }
        );
        qBanding = (
                { name = "11"; color = centromere;
                        stop = "0.0211"; start = "0.0000"; }
        );
 }

Each entry in either pBanding or qBanding is a dictionary and describes a single band. The keys are name, color, start and stop. Again, name is used both for reference of the band and as a name listed in the figure produced. color may be either a name of a color which is then defined elsewhere or a dictionary containing the keys r, g and b specifying color in RGB-space (values range from 0 to 1). There are predefined colors with names black, white, grey and centromere. The color centromere is special in that it doesn't refer to a plain color, but to a pattern, which can be used to hightlight the centromeric region. Banding data for human chromosomes is supplied with the standard configuration file.

MSF: placements

Another section in the MSF describes the actual placement of chromosomes on a page. In the main dictionary the entry placements refers to a dictionary which contains the names of arbitrary many placements (as keys) which refer to dictionaries containing the actual placement specification. This way one can collect many useful placement schemes in one configuration file and refer to them by the --placement option. If no placement is specified by the --placement-option the placement with name standard is chosen.

MSF: placements:lanes

The layout for chromosomes is specified, by grouping chromosomes together and designating a lane to diplay groups of chromosomes. The key used to specify the layout is lanes. Example:

        lanes = (
                ( (1, 2, 3), (4, 5) ),
                ( (6, 7, 8, 9, 10, 11, 12) ),
                ( (13, 14, 15), (16, 17, 18) ),
                ( (19, 20, 21, 22), (X), (Y) )
        );

lanes is a list, for which each entry is a list itself, specifying a lane of chromosome. The above example will display chromosomes 1,2,3,4 and 5 in the first lane, chromosomes 6 to 12 in lane 2 and so on. The fact that each lane is a list of lists means that within each lane chromosomes are grouped. In the example chromosomes 1,2 and 3 are grouped together in lane 1 much as chromosomes 4 and 5 are (a further complication of subgrouping is explained below: Advanced topics).

MSF: placements:distances

The actual sizes of chromosomes is determined by their relative sizes together with absolute sizes specifying borders and other elements. The following table lists keys of a placement which specify absolute sizes. All measurements are in points (1/72th of an inch).

   Table 1: Absolute sizes to determine chromosome placements
   Key                Default  Explanation of value
                      value
 ---------------------------------------------------------------------
   borderLeft         70       Left border of the printed page
   borderTop          70       Top border of the printed page
   borderButtom       70       Bottom border of the printed page
   chromosomeWidth    10       Width of a chromosome
   spacing            70       Space between chromosomes
   intergroupSpacing  40       Additional space between groups of
                               chromosomes
   paperSize          dict     A dictionary containing the keys
                               width and height of the whole page
                               e.g. paperSize = { width = 595;
                               height = 842; }; for a DIN A4 page
   outlineWidth       0        The width of the chromosome outline.
                               For postscript output a size of 0 means
                               that the line is drawn as small as the
                               output device permits ("1 pixel").
   namesize           14       The font size of chromosome names

Figure 1 illustrates most sizes graphically. Provided that only 3 chromosomes are displayed altogether within a sinlge lane and that chromosome 1 and 2 are grouped the following identities hold: [a]: borderTop, [b]: borderLeft, [c]: spacing, [d]: intergroupSpacing, [e]: borderBottom, [f]: chromosomeWidth. Note, that the right border is determined by the sizes [b], [c], [d] and [f] and therefore doesn't need to be specified.

              ^
              |                                    |[f]|
             [a]                                   \   /
              |                                     | |
             \/
             .-.           .-.                      .-.
<-   [b]  -> | | <- [c] -> | | <- [c] ->  <- [d] -> | | 
             | |           | |                      | |
             | |           | |                      | |
             `-'           `-'                      `-'
             .-.           .-.                      .-.
             | |           | |                      | |
             | |           | |                      | |
             | |           | |                      | |
             `-'           `-'                      `-'
              1             2                        3
              ^
              |
             [e]
              |
             \/
  Figure 1: Graphical illustration of size specifications

MSF: placements:colors

Additionally to sizes, also colors may be specified. backgroundColor and outlineColor specify the relevant colors. Chromosome names are drawn with the color outlineColor.

Annotations

In the context of this software package all graphical elements besides chromosomal shapes and names are called annotations. Annotations are furtherly subdivided into internal and external annotations. The difference is that internal annotations are confined to the chromosomal shape, whereas external annotations are free to draw anywhere on the page. This behaviour is garuanteed by the Postscript clip operator. Any annotation is performed by a specific Perl class. For comprehensiveness the readily supplied annotation classes are discussed here, rather than in the respective class documentation. For each placement configuration there are two lists with keys internalAnnotations and annotations to denote which kinds of annotation should be applied. Each list contains dictionaries which contain the keys name and config. name specifies the type of annotation and config provides additional information forwarded to the annotation module. Possible annotations are enumerated below.

Annotations: banding

The banding annotation is an internal annotation which draws a banding pattern within the chromosome shape as provided by the chromosome definitions (see MSF: chromosomes). The config dictionary may contain the key drawBandingWithVisualEffect when the value may be chrome. This option imitates some chrome effect.

Annotations: bandingNames

This type of annotation draws band names to the left hand side of chromosomes. The config dictionary must contain the key config where the value is a name which refers toanother dictionary. A key with this name must exist in the dictionary bandNamePlacements which in turn is located in the main dictionary. This way a specification of how to place banding names can be shared amongst several chromosomal placement schemes. The annotation dictionary therefore could read:

 {
        name = bandingNames;
        config = { config = standard; };
 }

The config dictionary referred to by name must contain the following keys: font, distatnce, color, tags. Font refers to a dictionary containing font name and size (e.g. 'font = { name = Helvetica; size = 4; }'). distance denotes the space between chromosome and the band name (distance [a] in Fig. 2). color specifies which color is used to draw the lines and names. Again it may be a name or a dictionary specifying RGB values of the color. Finally tags specifies to which chromosomes band names should be attached (this is only relevant when using subgroups; s. below). For most cases the value 'tags = (``'');' should suffice. The size [c] (Fig.2) is the length of the line connecting band name and middle of the band if an overlap has to be avoided. This distance can be specified by the optional labelInset key and is chosen as 3 times the font size as a default value.

                          .-.
                          | |
            band1 <-[a]-> | |
   band2 ---------        | | 
         |<-[c]->|        `-'
                          .-.
                          | |
                          | |
                          | |
                          `-'
 Figure 2: Distances involved in placing band names.

Annotations: plain

A very simple kind of internal annotation is the plain annotation. In the config dict it expects a key color which denotes a color to fill the interior of the chromosome (for the key tags, see ``Advanced Topics'' below). For example:

 {
        name = plain;
        config = { color = grey; };
        tags = ( "" );
 }

Annotations: itags

To highlight distinct chromosomal positions the internal annotation itag may be used. The config dictionary contains the key defaultColorHeight which denotes the vertical size of rectangular marks printed inside the chromosomal shape. This size is relative to the largest chromosomal height on the page. Since chromosomes are scaled according to the available space the relative size of marks will keep constant no matter to what size chromosomes are streched to.

 {      name = itags;
        config = {
                /* height in percent of maximal chromosome length */
                defaultColorHeight = 0.0005;
        };
        tags = ( "" );
 }

The information where to place marks and what color to give them is stored in a different file. This file is specified via the --labelFile option. This file is in PropertyList format and has the following format. A main dictionary contains a list referred to by labels, and the dictionaries call colors and labelProperties. The following example illustrates the file format:

 {
        labels = (
                { v = "1.000000"; tag = "H";
                  l = "D10S582"; p = "10p21.3:0.5"; },
                { v = "1.000000";
                  l = "D17S953"; p = "10p21.3:0.6"; }
        );
        colors = {
                interpolationList = (
                        { value = "0"; color =
                          { g = "1"; b = "0"; r = "1"; }; },
                        { value = "0.05"; color =
                          { g = "1"; b = "0"; r = "0"; }; },
                        { value = "0.06"; color =
                          { g = "0"; b = "0"; r = "1"; }; },
                        { value = "1"; color =
                          { g = "0"; b = "1"; r = "0"; }; }
                );
                defaultValue = "0";
                specialColor = { g = "0"; b = "1"; r = "0"; };
        };
        labelProperties = { selectSmallerEqual = ".05"; };
 }

Each label is a dictionary with the keys v, p, l and tag. v is a value between 0 and 1 and represents a quantitative information about a locus the position of which is specified by p. p may be either a chromosomal position in standard nomenclature or a numerical value (for a detailed explanation see Representing chromosomal locations). l and tag are textual annotations not used for internal tagging. The color section of the label file describes how values are mapped to colors. One entry in this dictionary is an interpolationList. Each entry in that list is a dictionary with a key value and a key color. If the value of a label matches one of the list the respective color is chosen. If it is not in the list the color is linerly interpolated between the entries with values matching most closely. The defaultValue designates the value in case of missing value for an label entry. specialColor is chosen if the label value is bigger than 1. labelProperties are only used for the etag annotation.

Annotations: etags

The external annotation etags use the same information as itags to draw textual information at specified positions besides chromosomes. A specification for etags annotations may look as follows:

 {
        name = etags;
        config = { config = standard; };
        tags = ( "" );
 }

The value of config within the config dictionary specifies an entry within the global dictionary with key labelPlacements (compare description of bandingNames annotations). An exmaple is shown:

 standard = {
        height = 6;
        chromosomeDistance = 15;
        curveInset = 3;
        font = { name = Helvetica; size = 4; }
        doPrintLabelText = YES;
 };

Figure 3 illustrates the distances involved in drawing the labels. [a] corresponds to curveInset, [b] to height and [c] to chromosomeDistance. The height parameter allows to define a rectangular region centered around the lable to be free of other label texts. If two rectangular regions associated with different labels overlap the position of the labels is changed to eliminate the overlap. This is done by a locally optimizing algorithm which minimizes the amount of movement. A bezier curve is drawn which connects chromosomal position and label text. You can suppress label printing (just showing the bezier curves) by setting doPrintLabelText to NO.

  .-.                                    ____
  | |                                   /  ^
  | |                                __/   |
  | | <-[a]-> --------- <-[a]-> label__   [b]
  `-' <-         [c]         ->        \   |
  .-.                                   \_\/__
  | |                                    
  | |
  | |
  `-'
 Figure 3: Distances involved in placing labels.

Annotations: sideline

This annotation is to represent some numerical value ranging across chromosomal locations.

Representing chromosomal locations

The annotations itags and etags need to specify chromosomal locations. There are several options to give these positions. In the label file each label entry may contains the key p which can have either of two formats. First it is possible to use ``ChromosomeArmBandname:fraction'' where Chromosome must be contained in the chromosomes dictionary, Arm must be either 'p' or 'q', Bandname must be contained in the band name dictionary of the chromosome. Finally fraction indicates where within a band the position is to be localized. fraction ranges from 0 to 1 with 0 indicating the centromeric start of the band and 1 denoting the most telomeric position within a band. Alternatively the positions p may be specified by a single fraction ranging over the whole chromosome. The format is as follows fraction immidiately followd by 'c' follow by a chromosome name (e.g. '0.2172c11'). A fraction of 1 indicates the most telomeric position on the 'p' arm and a fraction of 0 points to the telomeric position of the 'q' arm. A further level of indirection is possible. The option --mapFile specifies a file containing positions only. A global dictionary contains labels as keys which point to a directory which must contain the key p which specifies a position by either of the formats explicated above. An example is shown:

 {
        D11S4111 = { p = "0.811944c11"; band = "11q23.1"; };
        D11S4112 = { p = "0.999278c11"; band = "11q25"; };
        D12S1070 = { p = "0.725909c12"; band = "12q23.1"; };
 }

If no position is given in the label file as supplied to the itags and etags annotations the program tries to look up the label name in the map file dictionary. This way you may generate a label file from some experimental data and keep that file separate from the positioning information.

Advanced topics: subgrouping

With the capabilities presented so far only simple ideograms are possible. For example it is difficult to produce diploid ideograms. To accomplish arrangements like this it is possible to establish a variant of the chromosomal placements (MSF: placements). The following examples illustrates the concept of subgrouping:

 lanes = (
        ( ((1, 1_A), (2, 2_A), (3, 3_A) ), ( (4, 4_A), (5, 5_A)) ),
        ( ((6, 6_A), (7, 7_A), (8, 8_A), (9, 9_A), (10, 10_A),
           (11, 11_A), (12, 12_A)) ),
        ( ((13, 13_A), (14, 14_A), (15, 15_A)), ((16, 16_A),
           (17, 17_A), (18, 18_A))),
        ( ((19, 19_A), (20, 20_A), (21, 21_A), (22, 22_A)),
          ((X, X_A)), ((Y, Y_A)) )
 );

In contrast with simple grouping, subgrouping uses lists within groups of chromomes. A new distance in the placement dictionary called intergroupSpacing specifies the distance between chromosomes in subgroups. Also you have the possibiliy to attach an alphanumerical tag with an underscore to a chromosome name. This is to later refer to specific chromosomes should several chromosomes with the same name should exist or a specific set of chromosomes need to be distinguished from the others (this feature is also possible when no subgrouping is used). In the list of annotations each dictionary contains an entry called tags which is a list of strings. These strings designate tags of chromosomes to which the annotation in question should be applied. The string ``'' is used to specify plain chromsome names. In the above example the string ``A'' would apply to all second chromosomes within each subgroup.


INTERNALS

How the space is allotted

Horizontal space is dispersed by the absolute sizes decribed in the paragraph MSF: placements:distances. This gives you fine control over the horizontal arrangement of chromosomes. The vertical space, however, is allotted relative to chromosomal sizes in order to use all available space on the page. First all absolute vertical sizes are collected and subtracted from the available vertical space. Next the maximal chromosomal sizes of each lane are calculated. The size of the lanes is then calculated by being proportional to the relative sizes and filling the available space.

How to extend this software package

If you want to write new annotion classes you have to subclass the class CCAnnotation. The available methods are documented there. You should also have a look at he other annotation classes as example reference. Especially the plain annotation is as simple as an annotation can be.


FILES

If not specified by the --chromosomeSpec option, the specification file is sought for in the following positions: