Back to SimWalk2 Overview
This data file contains general information on the loci. For each locus, one provides the name, chromosome (the unspecific ‘autosome’ is allowed), number of alleles, and number of phenotypes defined at that locus. Then for each allele, one includes the name of the allele and its frequency. Finally, for each phenotype one defines which genotypes are consistent with that phenotype. (Of course if the pedigree file contains only genotypes at some locus, a common occurrence today, then there may be no phenotypes defined at that locus.)
Two points are crucial to keep in mind. First, the analysis is performed using only those loci in both the map and locus data files. Second, the locus file and the pedigree file must be perfectly coordinated in the sense that the phenotype fields for individuals must match exactly the order of the loci in the locus data file.
Fortran uses the following format codes, also called descriptors, to describe data: (A) is used for character data, (I) for integer data, (F) for numbers with decimals, and (X) for blank spaces. For example, (A8) specifies a word of length eight characters, (I2) specifies an integer occupying two spaces, (F8.5) specifies that the following eight spaces contain a number with a decimal part and (1X) specifies a single blank space.
The locus data file contains information describing the genetic loci involved in a problem. The sample locus file below includes two loci, ABO and MK.
____________________<top of file>_____________________
ABO AUTOSOME 3 4___________________<bottom of file>___________________
MK AUTOSOME 2 3
Inspection of this example shows that data on the loci are provided one locus at a time. Keeping the format descriptors mentioned above in mind, the following records are required for each locus:
Another example LOCUS.DAT file is available with annotations.
Implicit in the above conventions is the assumption that phenotype penetrances are either 0 or 1. This is true when a genotype always gives rise to the same qualitative phenotype, e.g., for all codominant marker loci. For disease genes with incomplete penetrance one must also specify a penetrance data file, described below.
Also, for disease genes, one should list the normal, or wildtype, allele first and the affected allele second. This is vital for coordination with the penetrance file.
For a marker locus often no phenotypes at all will be attached to the locus; only those phenotypes appearing in the associated pedigree file are really necessary. However, at least one allele should always be listed for each locus. An error is produced if allele frequencies for a locus do not sum to approximately 1. (If they sum to approximately 1, then they may be adjusted slightly to force them to sum to exactly 1, in which case a warning message is issued.)
The locus file and the pedigree file must be coordinated in the sense that the phenotype fields for individuals must match exactly the order of the loci in the locus file. Thus, a pedigree file appropriate to the above locus file would have ABO and MK phenotypes as items six and seven on each individual record (see the pedigree data file format definitions). No other phenotypes would be expected or allowed. (For SimWalk2, the order in which the loci are analyzed may easily be altered from that in the locus and pedigree files using either a map file or batch item #14 in the BATCH2.DAT file.)
For a locus with many codominant alleles, it is cumbersome to list a large number of phenotypes in the locus file. As a matter of convenience, genotypes can be substituted for phenotypes in the pedigree file. For instance, at the ABO locus the genotype A/B can be substituted for the phenotype AB wherever it appears in the pedigree file. If this is done, the two constituent alleles A and B on either side of the forward slash will be identified and it will be checked that these are among the possible alleles in the locus file. Provided all people of phenotype AB are listed as A/B in the pedigree file, the phenotype AB can then be omitted from the locus file. Note that all genotypes substituted for phenotypes in the pedigree file must occupy eight characters or fewer.
The SimWalk2 locus file is in the same format required by Mendel version 3, except:
- all loci must be autosomal;
- all marker loci must be codominant;
- the trait locus, if present, must be the initial locus;
- the trait allele names must be at most 3 characters long;
- the trait genotypes must be unordered; and
- all allele names must not contain the characters '*' (asterisk) nor '/' (forward slash).