NAME

lociLocations.pl - A program to retrieve chromosomal locations by name


SYNOPSIS

lociLocations.pl [--help] [--locusColumn=columnName] [--path=path of database] [--locilistPath=locilistPath] [--retrieveLoci=loci names] [--anonymous] [--calculateGaps] [--aliases=aliasesPath] [--source=name]


DESCRIPTION

This program takes a list of markers and retrieves the physical and cytogenetic locations from a database. If called with the option --calculateGaps it calculates a histogram of gaps for a given marker set. This allows to check for the saturation of the genome by the given marker set. The loci to be retrieved either from STDIN or from a file specified by the --locilist option. This input is expected to be a tab separated file (see options --locus, --anonymous).

Program output is printed to STDOUT and is in PropertyList format. A dictionary is produced holding as keys the loci names. The values are dictionaries again containing physical (key p) and cytongenetic (key band) information. You can redirect to a file and may directly use it with the --labelMap option of coloredChromosomes.pl.


OPTIONS

--aliases=aliasesPath
If marker names from STDIN or the --locilist option differ from these used in the database you may specify a file which holds information to map between the name spaces. This file is in PropertyList format and holds a single dictionary. Keys are names from the input and values are names in the database.

--anonymous
This option can be used if names of loci are present in a plain file holding one marker name in each line. Then the first line is not read to specify column names but is interpreted as marker name directly.

--calculateGaps
Instead of producing output with names of loci this option prints a histogram of gaps between the specified loci.

-h, --help
Print a help text.

--locilistPath=locilistPath
Points to a file holding the names of the loci to be retrieved. If that option is omitted STDIN is read instead (see options --locusColumn, --anonymous).

--locusColumn=columnName
The loci list is extracted from a tab separated file. If more than one column is present, specify the right column by name using this option. Column names are read from the first line of the file.

--retrievLoci=loci names
With this option loci names can be passed as an argument. For example: --retrieveLoci ``HLA-DRB1 HD''

--path=path of database
This option indicates which database to use. This information is forwarded to the retrieving module (see option --source) and interpreted accordingly.

--source=source name
This option specifies which source is used for information retrieval. Currently only the ldb database is supported. You can copy the files of this database from ftp://cedar.genetics.soton.ac.uk/pub. Download the whole directory to local disk and specify your local directory with the --path option.


ENSEMBL

The default source for retrieval is ENSEMBL as of package version 1.4.2. You need to install the ENSEMBL API as described in http://www.ensembl.org/Docs/linked_docs/ensembl_tutorial.pdf . The source argument of lociLocation is intepreted as follows: --source host:user:db and defaults to 'ensembldb.ensembl.org:anonymous:homo_sapiens_core_29_35b'. Since database versions change frequently and old versions become unavailable you should use the most recent numbers from http://www.ensembl.org/Homo_sapiens. If, for example the web page reads 'Current Release 29.35b' than the database name should be 'homo_sapiens_core_29_35b'. This is clearly a shortcoming in the ENSEMBL database structure where a reference to the current version is lacking. Another problem is that only gene symbols are allowed right now. There is no mechanism to retrieve microsatellite markers. If either of these things do bother you please send an email to helpdesk@ensembl.org (as did I). Hopefully this gives momentum to required code changes at EMBL.


SEE ALSO

coloredChromosomes.pl(1) PropertyList(3)