Back to SimWalk2 Overview
_____________________________<Top of File>______________________________________ HEF version 1.1.1 generated by SimWalk2 2.89 Test Data Set (Run 22) 0 : Name of chromosome in this file (0 = unknown) 2 marker loci in this file Marker Position-(cM) Number of Allele Names and Name Female Male Alleles Frequencies D22S15 0.00 0.00 10 1 0.460 2 0.460 3 0.010 4 0.010 5 0.010 6 0.010 7 0.010 8 0.010 9 0.010 10 0.010 D22S756 11.25 10.10 6 1 0.000 2 0.050 3 0.100 4 0.650 5 0.050 6 0.150 2 pedigrees in this file Pedigree Name, ID Father Mother Sex Trait Alleles Source Data No. of Individuals Pheno Pat Mat Pat Mat Pat Mat and Pedigree Score ________________________________________________________________________________ Oxford(ped#001;run#22) 3 individuals -101.234 1 0 0 1 2* 1 2 0 0 1 1 3 4 0 0 1 1 2 0 0 2 1 0 2 0 0 0 0 4 3 0 0 1 1 3 1 2 2 2* 1 2 1 2 1 1 3 3 1 2 1 1 ________________________________________________________________________________ 19980915(ped#002;run#22) 6 individuals -221.876 1 0 0 2 1 1 2 0 0 1 1 3 4 0 0 1 1 2 0 0 1 unknown 2 0 0 0 0 0 3 0 0 0 0 0 3 2 1 2 1 2 1 1 1 1 1 3 4 1 2 1 1 4 2 1 1 affected* 2 2 1 2 1 1 3 4 1 2 1 1 5 0 0 1 unknown 0 0 0 0 0 0 0 5 0 0 0 0 6 5 3 2 3* 0 1 2 2 0 0 5 4 2 2 1 1 _____________________________<Bottom of File>___________________________________ Detailed description of Haplotype Exchange Format version 1.1.1 HEF is designed to be both human readable and easily parsed by many languages: C, VB, Fortran, etc. To enable column-centric languages to parse the file, the white space should consist only of spaces, no tabs. There must be at least one space between each data word defined below. However, there should be no spaces within any of the data words defined below. Many haplotyping programs will require that the allele names be sequential integers starting with 1. All lines not listed below are ignored. They should probably be blank for human readability. Do not use underscore characters except where noted below. Line 01: the first words are the name and version number of the format that this file conforms to, e.g., "HEF version 1.1.1" where HEF stands for Haplotype Exchange Format; optionally (on the rest of the line) one may add the name and version of the program which generated the file. Line 03: title for this run (in the first 40 columns); the title may be blank. Line 05: first word (in the first 8 columns) is the name of the chromosome displayed in this file. Can be X,Y,U, or non-negative integers < 23. Line 08: first word (in the first 8 columns) is the number of marker loci displayed in this file. Line 10: titles for human readability. Line 11: titles for human readability. Line 13: first word (in the first 8 columns) is the name of first marker locus; second word (in columns 10-15) is the position of this marker on female haplotypes in cM from some fixed starting point; third word (in columns 17-22) is the position of this marker on male haplotypes in cM from some fixed starting point. All marker distances are measured from the SAME fixed starting point. fourth word (in columns 28-30) is the number of alleles at this marker. Line 14: first word (in columns 35-37) is the name of the first allele; second word (in columns 40-44) is the first allele's frequency; third word (in columns 47-49) is the name of the second allele; fourth word (in columns 52-56) is the second allele's frequency; fifth word (in columns 59-61) is the name of the third allele; sixth word (in columns 64-68) is the third allele's frequency; seventh word (in columns 71-73) is the name of the fourth allele; eighth word (in columns 76-80) is the fourth allele's frequency; This type of line is repeated until the number of alleles, which was read on the previous line, is exhausted. The next line is similar to line 13 but for the second marker. Then the second marker's allele frequency lines, similar to Line 14, etc.. This is repeated until the number of markers, read on line 08, is exhausted. After skipping two lines, the next line's first word (in columns 1-8) is the number of pedigrees in this file. The next line is ignored. (see Lines 20-23) After three lines of titles for human readability, and one line of underscores, the pedigree haplotype data begins. (see Lines 24-27) For each pedigree, the first word (in the first 32 columns) on the first line of pedigree data is the name of the pedigree haplotype to follow. (see Line 28) The first word (in the first 8 columns) on the second line of pedigree data is the number of individuals in the pedigree. (see Line 29) The first word (in the first 8 columns) on the third line of pedigree data is a real number score for the haplotype which follows. (see Line 30) This score can be used to show the overall log-10 likelihood of this haplotype. Alternatively, this score can be used to show the relative likelihood of different haplotypes of the same pedigree, e.g., 0.45 versus 0.35 and 0.20. In this case the pedigree is listed three times each with different haplotypes but always with the same pedigree name. (This counts as three pedigrees within the "number of pedigrees in this file" data value.) The first word (in columns 16-23) on the next line is the ID of the first individual. The second word (in columns 25-32) is the ID of this individual's father. The third word (in columns 34-41) is the ID of this individual's mother. The code for a missing parent is 0 (zero). The fourth word on this line (in column 44) is the code for the sex of this individual. The code for male is 1 (one) and the code for female is 2 (two). The fifth word on this line (in columns 46-54) is the trait phenotype for this individual. A missing trait phenotype is coded using the string 'unknown'. An asterisk ('*') is added to the end of the phenotype, if this individual is affected. (see Line 31) Next is one line for each marker locus. On each of these lines, the first word (in columns 52-54) is the name of the allele at this marker locus on the paternal haplotype. The second word (in columns 56-58) is the name of the allele at this marker locus on the maternal haplotype. The code for an allele which is never typed as it descends down the pedigree is 0 (zero); there is no information to determine which allele this is. The third word on this line (in column 64) is the code for the grandparental source of the paternal allele at this marker locus. The fourth word on this line (in column 68) is the code for the grandparental source of the maternal allele at this marker locus. These grandparental source codes are always 0 (zero) for founders and either 1 (one) or 2 (two) for non-founders. A 1 (one) indicates the allele came from the paternal grandparent and thus from the paternal haplotype of the parent. A 2 (two) indicates the allele came from the maternal grandparent and thus from the maternal haplotype of the parent. With grandparental source information haplotype bars may always be drawn, even when the parents are homozygous. A change in the source pattern from a 1 (one) to a 2 (two), or vice versa, indicates a recombination event in that interval. The fifth word (in column 74) on this line is the code for the data availability of the paternal allele at this marker locus. The sixth word (in column 78) on this line is the code for the data availability of the maternal allele at this marker locus. These data codes are always either 0 (zero) or 1 (one). A 0 (zero) indicates this allele was not typed and is being inferred. A 1 (one) indicates this allele was typed in the original dataset. (An allele forced by the other data is considered as typed.) The next line is similar except for the second marker locus. (see Lines 32-33) This is repeated until all marker loci are exhausted. Then the next individual is reported: with a first line for their ID, parents, sex and trait; and a series of haplotype entry lines, one for each marker locus. This is repeated until the number of individuals in this pedigree is exhausted. The first pedigree is completed by a line of underscores. (see Line 40) The next line begins the data for the second pedigree. (see Lines 41-62) Finally, this is repeated for each pedigree in the file. After the last pedigree, all following lines are ignored.