mffinder - A program to detect genetic interaction patterns in genetic data sets


mffinder [-dCvntxmSosl] input-file ACLength [minData ORPredis ORProt]


Calling the program

The minimum amount of information to be given to the program is an input file (see section 'The input file format') and the number of alleles which shall contistute the allele combinations (ACs; given by ACLength). Optionally minData specifies the minimum number of observations for an AC such that it is included in the output. ORPredis and ORProt are odds ratios which also constrain the output. If the OR as computed for an AC is smaller than ORPredis and bigger than ORProt this AC is not output. Options are as follows:

-d direction
Direction of the bootstrap procedure to compute the test statistic.

-C count
Number of cycles to estimate the distribution of the test statistic

Do verbose output.

Do not sort the output. List ACs by the lexikographic order given by locus positions (the order defined by the input file) and allele names (the numbers chosen to denote individual alleles).

Print the output as tab-separted table.

Ignore ACs which have 0 observations in either group.

-m simulation specification
Do power simulations. This option requires an argument which specifies a file name which holds further information of what to simulate. This is not yet covered in this manual.

-S sortingType
Sort the output according to an option argument. Valid sorting methods are 'or', 'nor', 'frequency', 'p', 'none' with obvious meanings.

-o path
Write the output to the file given as option argument

-s model
Compute the test statistic for the data with model model which is one in 'Additive', 'SingleCount', 'Genotypes'.

-l loci
Use exactly the loci given as the option argument for further analysis (e.g. -l ``l1 l2 someLocus'').

The input file format

mffinder reads an input file given as a tab-separted ascii file. It contains both genotype and phenotype information on all individuals to be analyzed. The first line contains names of the columns. Each column holds genotype information for one locus except for the last column which holds the phenotype.


The locus names are ``locus1'', ``locus2'' and ``HLA'', respectively. The last column describes the phenotype irrespective of its name. All other lines occur in pairs with each pair describing an individual. Each line holds one allele observed at each locus in the same order as listed in the first line. The last number has to be the phenotype of the individual. The second line gives the second alleles of the genotypes of the individual and the phenotype is repeated in the last column. Missing data is represented as '.' (dot). All genotypes must be one or tow digit numbers and the phenotype must be either '0' or '1'.

The format of the output

The output starts with a line listing all loci for which the following analysis is conducted (quotes are inset):

 Loci under scrutiny: IFNA1 TNFRSF1A-2 NFKBIA-6

Next for each penetrance model identical sections are being output.

 Model: Additive
 Count of tests: 20609
 Allele Combination     CTU     CAU     CTA     CAA     FU      FA      RR      p       REE
 IFNA1:07, TNFRSF1A-2:02        1       1360    40      1580    0.0007  0.0253  34.43   0.0000  2.7846
 IFNA1:07, NFKBIA-6:01  3       1324    44      648     0.0023  0.0679  29.97   0.0000  2.4073

The examples shows a excerpt from a concrete analysis. Here the Additive model is shown. The line Count of tests holds the information of how many test statistics were computed which can be used for some correction procedure. The next line describes the columns of the table listed from the 4th line on.

Allele Combination
The allele combination is listed in the following form l1:a1, l2:a2, ... ln:an where l1, ..., ln are names of the loci and a1, ..., an are the names of the alleles (which must be numbers in the current implementation).

Count of this AC in the group of unaffecteds.

Count of all ACs in the group of unaffecteds which were observed for the loci of the current AC.

Count of this AC in the group of affecteds.

Count of all ACs in the group of affecteds which were observed for the loci of the current AC.

The frequency of this AC as computed from CTU and CAU.

The frequency of this AC as computed from CTA and CAA.

The odds ratio of this AC.

A p-value as computed from a asymptotic Chi-Square statistics (which might be insufficient if sample sizes are low).

The normalized OR.

All following lines list ACs which satisfy the constraints given by certain arguments (minData, ORPredis ORProt). The next penetrance model is introduced by a Model: line.