--- title: "pould: Phased Or Unphased Linkage Disequilibrium" author: "Steven J. Mack, PhD -- steven.mack@ucsf.edu" date: "October 8, 2020" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using pould} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) options(rmarkdown.html_vignette.check_title = FALSE) ``` * Package Version: 1.0.1 # Overview The *pould* package (pronounced "pooled") calculates four linkage disequilibrium (LD) statistics -- *D^'*, *W_n* and the conditional asymmetric LD (cALD) measures, *W_A/B* and *W_B/A*, for genotype data from pairs of genetic loci, and can treat these data as either phased or unphased for these calculations. The package includes a wrapper function that parses either column-formatted genotypes or multi-locus haplotypes in the [17th International HLA and Immunogenetics Workshop](http://17ihiw.org) (IHIW) Family Haplotype Project's HaplObserve output format. This wrapper function generates output files in a user-defined directory. The package includes a function that applies a sign test to LD values for phased and unphased haplotypes generated by the wrapper function for haplotype- or genotype-formatted datasets, and a function that generates heat-maps for each LD measure. For more information, see:
Osoegawa et al. [Hum Immunol. 2019;80(9):633-643](https://doi.org/10.1016/j.humimm.2019.01.010).
Osoegawa et al. [Hum Immunol. 2019;80(9):644-660](https://doi.org/10.1016/j.humimm.2019.05.018). For information about these LD measures, see:
Hedrick PW. [Genetics; 1987,117,331-41](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1203208/pdf/331.pdf).
Cramér H. Mathematical Models of Statistics, 1946, Princeton University Press, Princeton NJ.
Thomson G, Single RM. [Genetics. 2014;198(1):321-31](https://doi.org/10.1534/genetics.114.165266). *Pould* accepts genotype and haplotype data for *individual subjects* as input data. To calculate the cALD measures using haplotype frequency data, try the [*asymLD*](https://CRAN.R-project.org/package=asymLD) package. See: Single et al. [Hum Immunol. 2016;77(3):288-294](https://doi.org/10.1016/j.humimm.2015.09.001) for more information about *asymLD*. ## Functions and Input Formats ### cALD The *cALD()* function calculates the LD measures from genotype data for pairs of loci. The `drb1.dqb1.demo` dataset, shown below, illustrates the input format. The function accepts a 4-column data frame or tab-delimited text file with data for the first locus in columns 1 and 2, and data for the second locus in columns 3 and 4. ``` > drb1.dqb1.demo[1:6,] ## DRB1 DRB1 DQB1 DQB1 ## 1 15:01 07:01 06:02 03:03 ## 2 04:05 13:01 03:02 06:03 ## 3 04:01 13:02 03:02 06:09 ## 4 16:02 04:05 05:02 03:02 ## 5 15:01 07:01 06:02 02:02 ## 6 04:01 04:01 03:02 03:02 ``` The locus names should be used as column headers, with one allele/variant in each column. Each row represents a single subject. The headers for columns 1 and 2 must be identical, as must the headers for columns 3 and 4. If phase is known, *cALD()* assumes that columns 1 and 3 represent one haplotype, and that columns 2 and 4 represent the second haplotype. While HLA data are shown above, *cALD()* can accept any genetic data. The example below combines data for HLA-DQA1 and rs7743506. ``` HLA-DQA1 HLA-DQA1 rs7743506 rs7743506 02:01 01:02 C C 04:01 01:02 A C 04:01 05:01 A C 01:01 01:02 C C 04:01 01:01 A C 05:01 01:02 C C ``` ### LDWrap The *LDWrap()* function parses either genotype data formatted in a two-column per locus format, or haplotype data formatted using the 17th International HLA and Immunogenetic Workshop Family Haplotype Project's GL String-based format. The function accepts a data frame, a tab-formatted (.txt or .tsv) columnar genotype file, or a comma-separated value formatted (.csv) GL String haplotype file, and passes genotype data for all pairs of loci in that dataset to *cALD()* for LD analysis. #### Input Formats ##### Haplotype Data A minimal *LDWrap()* haplotype data file or data frame contains two columns named "Relation" and "Gl String". Other columns are allowed, but are ignored. The `hla.hap.demo` dataset (shown below in edited form) illustrates the input format. Each row contains data for a single subject. The "Relation" column can contain any text string; however, values such as "mother", "father" and "child" are standard for the Family HLA Data Project. *LDWrap()* will ignore all rows in which `Relation=child`; rows with any other value in the "Relation" column will be processed. ``` Relation Gl String Subject HLA-A*02:01~HLA-C*07:02~HLA-B*07:02+HLA-A*01:01~HLA-C*06:02~HLA-B*57:01 Subject HLA-A*03:01~HLA-C*07:01~HLA-B*49:01+HLA-A*01:01~HLA-C*07:01~HLA-B*08:01 Subject HLA-A*11:01~HLA-C*04:01~HLA-B*15:01+HLA-A*03:01~HLA-C*08:02~HLA-B*14:02 Subject HLA-A*68:01~HLA-C*15:02~HLA-B*40:06+HLA-A*68:01~HLA-C*06:02~HLA-B*45:01 ``` The "Gl String" column contains a [GL String](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3715123/pdf/tan0082-0106.pdf) formatted multi-locus HLA haplotype. In GL String format, the ~ operator denotes phase, and the + operator denotes copies of genes (in this case diploidy). While GL Strings can be used to describe ambiguous alleles and genotypes using the / and | operators, ambiguous data cannot be included in an *LDWrap()* data file. *LDWrap()* requires that alleles be described as LOCUS`*`VARIANT (e.g., `HLA-DRB1*01:01`); locus prefixes (e.g., `HLA-`) are not required, but if locus refixes are included, all loci must be described using the same prefix. Allele data described without a locus (e.g., `01:01`) are not allowed. Unusual allele names (`HLA-A*NULL`, `HLA-DRB1*NoMatch`, `HLA-DPB1*NT`) and truncated versions of allele names (`HLA-A*01`, `HLA-A*01:01`, `HLA-A*01:01:01`, etc.) will be analyzed as distinct alleles, and may skew analytic results. *LDWrap()* includes an option to truncate colon-delimited allele names to specific numbers of fields for analysis. ##### Genotype Data A minimal *LDWrap()* genotype data file or data frame contains two columns per locus, with one allele in each column, as for *cALD()*, but accommodating more than two loci. Columns for the the same locus must be adjacent and can have identical names, or can be suffixed with "_1" and "_2". Columns named “SampleID" and "Disease" are permitted, but not required. No other columns are allowed. Allele names in each column can include a locus name (e.g., locus*allele); if locus names are not included, the locus name in the header will be associated with each allele in that column. ##### Locus Order The order in which the loci appear, either in columns or in the GL String haplotype, affects the identification of haplotype locus pairs for analysis. For example, if HLA loci are organized alphabetically, haplotypes of the HLA-B and HLA-C loci will be analyzed as B~C haplotypes; if they are organized by map order, those haplotypes will be analyzed as C~B haplotypes. B~C and C~B haplotypes will not be recognized as the same by *LD.heat.map()*. To avoid this, it is recommended to use the same organization of loci for all analyses, and to use map order for HLA or KIR loci. ### LD.sign.test The *LD.sign.test()* function applies the R Stats Package's *binom.test()* function to pairs of LD values (*D'*, *Wn*, *WLoc1/Loc2*, *WLoc2/Loc1*), as well as the number of haplotypes, for phased and unphased haplotypes. The *_LD_results.csv* files generated by *LDWrap()* are the input files for this function, and generally, *LDWrap()* must be used before this function can be applied. See the *LDWrap()* Outputs section below for an example. ### LD.heat.map The *LD.heat.map()* function generates heat-map plots of the LD values generated for each LD measure (*D'*, *Wn*, *WLoc1/Loc2*, and *WLoc2/Loc1*) for phased and unphased haplotpes. If LD values for only phased or unphased haplotpes are available, half-matrix heat-maps will be generated. The *_LD_results.csv* files generated by *LDWrap()* are the input files for this function, and generally, *LDWrap()* must be used before this function can be applied. See the *LDWrap()* Outputs section below for an example. ## Outputs ### cALD() By default, *cALD()* operates in "verbose" mode, and will write five lines of output to the console describing the phase-status of the LD analysis (phased or unphased), the loci and number of haplotypes analyzed, and the four LD measures calculated, as shown below. ``` > cALD(drb1.dqb1.demo) ## Calculating D', Wn and conditional ALD for 53 unphased genotypes at the DRB1 and DQB1 loci. ## D' for DRB1~DQB1 haplotypes: 0.95892767844544 (0.9589) ## Wn for DRB1~DQB1 haplotypes: 0.811250972337927 (0.8113) ## Variation of DQB1 conditioned on DRB1 (WDQB1/DRB1) = 0.904035615838528 (0.904) ## Variation of DRB1 conditioned on DQB1 (WDRB1/DQB1) = 0.778712696009626 (0.7787) ``` When `verbose=FALSE`, *cALD()* returns a vector of *D^'*, *W_n*, *W_B/A*, *W_A/B* and the number of haplotypes, as below. ``` > cALDres <- cALD(drb1.dqb1.demo, verbose=FALSE) > cALDres ## [1] "0.958463650196244" "0.811184752436694" "0.903300938910147" "0.778712697633606" "53" ``` In addition, when saveVector=TRUE, *cALD()* will write a text file, containing a vector of all haplotypes, their frequencies and counts for the analyzed locus pair, to a user-specified directory. Unless specified via the `vecDir` parameter, this file is written to the directory specified by *tempdir()*. This vector file also includes information on the dataset and phase status applied to the genotype data for the analysis. An example generated for the `drb1.dqb1.demo` dataset is shown below. ``` > cALD(drb1.dqb1.demo,saveVector = TRUE) ``` Haplotype vector file contents: ``` Dataset Phase DRB1~DQB1 Frequency Count DRB1~DQB1_haplotype_Vector_2018-05-02_16-53-12 FALSE 01:01~02:01 0 0 DRB1~DQB1_haplotype_Vector_2018-05-02_16-53-12 FALSE 01:02~02:01 0 0 DRB1~DQB1_haplotype_Vector_2018-05-02_16-53-12 FALSE 01:03~02:01 0 0 DRB1~DQB1_haplotype_Vector_2018-05-02_16-53-12 FALSE 03:01~02:01 0.094272076372315 79 DRB1~DQB1_haplotype_Vector_2018-05-02_16-53-12 FALSE 04:01~02:01 0 0 DRB1~DQB1_haplotype_Vector_2018-05-02_16-53-12 FALSE 04:02~02:01 0 0 DRB1~DQB1_haplotype_Vector_2018-05-02_16-53-12 FALSE 04:03~02:01 0 0 DRB1~DQB1_haplotype_Vector_2018-05-02_16-53-12 FALSE 04:04~02:01 0 0 . . . ``` ### LDWrap *LDWrap()* sends data to *cALD()*, captures the vector of LD results returned by *cALD()*, directs *cALD()* to write vectors of haplotypes for each locus pair into a user-specified directory, and writes a single table of LD results for all locus pairs to that same directory. As a single haplotype vector file is written for each locus pair, *LDWrap()* directs *cALD()* to write n(n-1)/2 haplotype vector files, where n is the number of loci in a haplotype. The only information *LDWrap()* returns to the console is a notification that the analysis has completed (`LD Analysis Complete`) and notifications that the provided dataset is missing the required columns. If the user specifies no destination for these files, they are written to the directory specified by *tempdir()*. When all locus pairs have been analyzed by *cALD()*, *LDWrap()* writes a six-column CSV file (`*LD_results.csv`) of aggregated LD result vectors collected from *cALD()* to the specified directory. The column headers in this file are Loc1~Loc2 (identifying the locus pair), *D^'*, *W_n*, *W_B/A*, *W_A/B* and N_Haplotypes. An example of this file is shown below. ``` > LDWrap(hla.hap.demo) LD Analysis Complete ``` LD results file contents: ``` Loc1~Loc2,D',Wn,WLoc1/Loc2,WLoc2/Loc1,N_Haplotypes A~C,0.469024805013898,0.362566555750013,0.366359427624652,0.384413960789992,191 A~B,0.540780240853345,0.446662593270748,0.36839918931955,0.471334711300434,241 A~DRB1,0.400002012804198,0.335434108343871,0.27413544158564,0.320399398449896,233 A~DRB3,Not Calculated,Subject Threshold=10,Complete subjects=8,., . . . ``` *LDWrap()* attemtps to peform these LD calculations for all pairs of loci in the *LDWrap()* datafile. If a haplotype dataset includes locus pairs for which the number of subjects is below the `threshold` value (see "Parameters", below), the `*LD_results.csv` file will include rows for locus pairs for which no LD calculations were performed. As shown above, those rows contain data similar to that shown for the A~DRB3 haplotpe -- `Not Calculated`, `Subject Threshold=10`, `Complete subjects=8`, `.`, and ' '. As shown below, in cases where at least one locus in a pair is monomorphic, no LD calculations are performed and the pertinent rows in the `*LD_results.csv` file will contain, `Not Calculated`, `Subject Threshold=10`, `Complete subjects=0`, `"locusName" is monomorphic.`, and ' '. ``` Loc1~Loc2,D',Wn,WLoc1/Loc2,WLoc2/Loc1,N_Haplotypes B~DRB3,Not Calculated,Subject Threshold=10,Complete subjects=130,DRB3 is monomorphic., DRB3~DRB4,Not Calculated,Subject Threshold=10,Complete subjects=130,DRB3 is monomorphic. DRB4 is monomorphic., DRB3~DQA1,Not Calculated,Subject Threshold=10,Complete subjects=93,DRB3 is monomorphic., ``` ### LD.sign.test For each LD measure and the number of haplotypes for phased and unphased versions of the same genotype data, *LD.sign.test()* reports the p-value of the sign test, comparing the number of locus pairs for which the value of the measure is higher in unphased haplotypes than phased haplotypes to the number of locus pairs for which that value is lower or equal. The function also reports the total number of locus pairs evaluated, and the number of locus pairs with equal values. These data can be reported in three ways; as a returned data frame, as a table written to the console, and as a CSV file written to a user-specified directory. All three report formats are illustrated below. Note, only the significance of the sign test is reported; when a significant trend is indicated, the directionality of the trend is not reported. ``` > LD.res <- LD.sign.test("hla-family-data") > View(LD.res) ``` Returned Data Frame ``` D' Wn WLoc1/Loc2 WLoc2/Loc1 N_Haplotypes #unphased > phased 1.500000e+01 1.400000e+01 1.500000e+01 1.500000e+01 0.000000e+00 #unphased = phased 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 #locus pairs 1.500000e+01 1.500000e+01 1.500000e+01 1.500000e+01 1.500000e+01 p-values 6.103516e-05 9.765625e-04 6.103516e-05 6.103516e-05 6.103516e-05 ``` ``` > LD.sign.test("hla-family-data", returnFrame = FALSE) ``` Console Table ``` Sign Test results for the hla-family-data dataset for 15 locus pairs. Measure #U > P #U = P p-value D' 15 0 6.104e-05 Wn 14 0 0.0009766 WLoc1/Loc2 15 0 6.104e-05 WLoc2/Loc1 15 0 6.104e-05 # Haplotypes 0 0 6.104e-05 ``` CSV File ``` ,D',Wn,WLoc1/Loc2,WLoc2/Loc1,N_Haplotypes #unphased > phased,15,14,15,15,0 #unphased = phased,0,0,0,0,0 #locus pairs,15,15,15,15,15 p-values,6.10351562500001e-05,0.0009765625,6.10351562500001e-05,6.10351562500001e-05,6.10351562500001e-05 ``` ### LD.heat.map The *LD.heat.map()* function generates heat-map plots for each LD measure, visualizing the range of LD values for each locus-pair analzed using *LDWrap()*. LD values for phased data are presented in the upper half of the plot matrix, while values for unphased data are presnted in the lower half of the plot matrix. Color (blue to white to red) or greyscale (dark grey to light grey) plots can be specified. When `writePlot=TRUE`, PNG-formatted LD plots will be written to a directory identified using the `writeDir` parameter; this parameter defaults to the directory specified by *tempdir()*. When the `dataName` parameter is provided, heat-map plots are written with the name `__heatmap.png`. When the `phasedData` and `unphasedData` parameters are provided, the heat-map plot for each LD measure is named `-__heatmap.png`. When no *LDwrap()* results are available, or when the loci differ between the datasets specified by `phasedData` and `unphasedData`, a notification is written to the console. There are no other outputs. ``` > LD.heat.map("family-data") ``` Console Notification ``` Files family-data_Phased_LD_results.csv and family-data_Unphased_LD_results.csv are not found. ``` ``` > LD.heat.map(phasedData="my-data_phased.csv",unphasedData="my-data_unphased.csv") ``` Console Notification ``` The specified phased and unphased datafiles are not found. ``` ``` > LD.heat.map(phasedData="my-ABC-data_phased.csv",unphasedData="my-ABDRB-data-unphased.csv") ``` Console Notification ``` Different loci included in my-ABC-data_phased.csv and my-ABDRB-data-unphased.csv. Cannot produce heatmaps. ``` ## Parameters ### cALD ``` cALD(dataSet, inPhase = FALSE, verbose = TRUE, saveVector = FALSE, vectorName = "", vectorPrefix = "", vecDir = tempdir()) ``` **dataSet** Class: Character. Required. No Default. e.g., `dataSet="foo.txt"` or `dataSet=drb1.dqb1.demo` Identifies the genotype data to be analyzed. `dataSet` should identify a four-column data-frame or tab-delimited text file. Columns 1 and 2 contain genotype data for one genetic locus, with one allele in each column. Columns 3 and 4 contain genotype data for the second locus. The headers for columns 1 and 2 identify the first locus, and must be identical. Similarly, the headers for columns 3 and 4 identify the second locus, and must be identical. **inPhase** Class: Logical. Required. Default=FALSE. Identifies how the genotype data should be analyzed. When `inPhase=FALSE`, the expectation-maximization (EM) algorithm is applied to the genotype data to generate estimated haplotypes. LD values are calculated for those EM haplotypes. When `inPhase=TRUE`, the genotype data are treated as phased, with the alleles in column 1 in phase with those in column 3, and the alleles in column 2 in phase with those in column 4. LD values are calculated for those phased haplotypes. **verbose** Class: Logical. Required. Default=TRUE. Identifies how LD values should be reported. When `verbose=TRUE`, values for *D^'*, *W_n*, *W_B/A*, *W_A/B* and the number of haplotypes evaluated are written to the console in a human readable form. When `verbose=FALSE`, those values are reported in a vector of length 5 and mode of "character". **saveVector** Class: Logical. Required. Default=FALSE. Specifies if a file containing the haplotype vector for the analyzed locus pair should be exported. If `saveVector=FALSE` no vector is written. If `saveVector=TRUE` a tab-delimited text file consisting of five columns is written to the working directory. The file includes the columns "Dataset", "Phase", the name of the analyzed locus pair, "Frequency", and "Count"; the latter two describe the frequency and count data for all possible haplptypes. **vectorName** Class: Character. Optional. No Default. (`saveVector=TRUE` specific). Provides a name for the tab-delimited haplotype vector file written when `saveVector=TRUE`. When `vectorName="foo"` and `saveVector=TRUE`, the name of the haplotype vector file will be "foo.txt". When `vectorName=""` and `saveVector=TRUE`, the name of the haplotype vector file will include the loci analyzed and a time-stamp, formatted as "Locus1~Locus2_haplotype_Vector_yyyy-MM-dd-HH-mm-ss.txt". **vectorPrefix** Class: Character. Optional. No Default. (`saveVector=TRUE` specific). Applies a prefix that includes the applied phase-status to the name of the tab-delimited haplotype vector file written when `saveVector=TRUE`, when `vectorName=""`. If any text string is provided for vectorName, this parameter is ignored. E.g., when `saveVector=TRUE`, `vectorName=""`, `vectorPrefix="foo"` and `inPhase=FALSE`, the haplotype vector file will be named "foo_unphased_Locus1~Locus2_haplotype_Vector_yyyy-MM-dd-HH-mm-ss.txt". When the `LDWrap()` function directs `cALD()` to write files containing haplotype vectors for each locus pair, `LDWrap()` provides the file information to `cALD` via the `vectorPrefix` parameter. The resulting files will contain the name of the family haplotype dataset processed by `LDWrap()`, the phase-status, haplotype pair-name, and a timestamp. If a positive `trunc` parameter was provided to `LDWrap()`, the truncation level will appear in the names of these haplotype vector files. **vecDir** Class: Character. Optional. Default=tempdir(). (`saveVector=TRUE` specific). Specifies the directory into which the haplotype vector file should be written. ### LDWrap ``` LDWrap(famData, threshold = 10, phased = TRUE, frameName = "hla-family-data") ``` **famData** Class: Character. Required. No Default. e.g., `famData="foo.csv"` or `famData=hla.hap.demo` Identifies the haplotype or genotype dataset to be analyzed. For haplotype data, `famData` should identify a data frame or CSV file. This dataset must inlcude columns with the headers "Relation" and "Gl String". If either (or both) of these column headers is not found in the dataset, or if the data file is not a CSV file, *LDWrap()* will halt the analysis with a notification about the missing header(s). For genotype data, `famData` should identify a data frame or tab-delimited text file with two columns of allele data for each locus. See the Functions and Input Formats section above for additional details about these dataset formats. **threshold** Class: Numeric. Required. Default=10. Identifies the minimum number of subjects with haplotype data for a given locus pair required for the analysis of that locus pair. Analysis for that locus pair is not performed if the threshold is not met, and the LD results file will identify the threshold value and the number of subjects with data for that locus pair. If `threshold` is set to less than 1, it is automatically set to 1. **phased** Class: Logical. Required. Default=TRUE. Specifies whether the haplotype data should be treated as phased (`phased=TRUE`) or unphased (`phased=FALSE`) for analysis. **frameName** Class: Character. Optional. Default="hla-family-data". Provides a name that will be included in the names of the result files if `famData` specifies a data frame. The value of `frameName` is passed to *cALD()* as the `vectorPrefix` parameter. If `famData` specifies a file, `frameName` is ignored. **trunc** Class: Numeric. Required. Default=0. Specifies the number of fields to which colon-delimited allele names in `famdData` should be truncated. The default value of 0 indicates no truncation. A value higher than the number of fields in the supplied allele data will result in no truncation. When a positive value of `trunc` is provided, the names of the output files will include the specified truncation level. **writeTo** Class: Character. Optional. Default=tempdir(). Specifies the directory into which LDWrap() should write files. ### LD.sign.test ``` LD.sign.test(dataName,verbose = TRUE,returnFrame = FALSE) ``` **dataName** Class: Character. Required. No Default. e.g., `dataName="foo"` The "base" name of the "_LD_result.csv" files generated by *LDWrap()*, with the "_Phased_LD_results.csv" or "_Unphased_LD_results.csv" suffixes removed. This corresponds to the value of the *LDWrap()* `frameName` parameter when the *LDWrap()* `famData` parameter does not specify a file; e.g., when specifying the "_LD_results.csv" files generated by *LDWrap()* for the `hla.hap.demo` data included with this package, `dataName="hla-family-data"`. If the corresponding "_Phased_LD_results.csv" or "_Unphased_LD_results.csv" files are not found, the function will halt with a notification identifying the file(s) that are not found. **verbose** Class: Logical. Required. Default=TRUE. Identifies if messages about function progress and results should be displayed in the console (verbose=TRUE) or not (verbose=FALSE). The default is verbose=TRUE. In addition to the table of results and messages about missing input files, messages regarding locus pairs for which no LD values are available, and about discrepancies between locus pairs with available data in the phased and unphased datasets are also suppressed with verbose=FALSE. **returnFrame** Class: Logical. Required. Default=TRUE. Identifies if a data frame of results should be returned (returnFrame=TRUE). If `returnFrame=FALSE`, a CSV file of results named "_LD-sign-test_results.csv" is written in the directory specified by the `resultDir` parameter. **resultDir** Class: Character. Optional. Default=tempdir(). Specifies the directory into which LD.sign.test() should write the CSV file of results. ### LD.heat.map ``` LD.heat.map(dataName) ``` **dataName** Class: Character. Optional. Default="". e.g., `dataName="foo"` The "base" name of the "_LD_result.csv" files generated by *LDWrap()*, with the "_Phased_LD_results.csv" or "_Unphased_LD_results.csv" suffixes removed. This corresponds to the value of the *LDWrap()* `frameName` parameter when the *LDWrap()* `famData` parameter does not specify a file; e.g., when specifying the "_LD_results.csv" files generated by *LDWrap()* for the `hla.hap.demo` data included with this package, `dataName="hla-family-data"`. If the corresponding "_Phased_LD_results.csv" and "_Unphased_LD_results.csv" files are not found, the function will halt with a notification identifying the files that are not found. If this parameter is omitted, `phasedData` and/or `unphasedData` must be provided. **phasedData** Class: Character. Optional. Default="". The complete name of a file of phased LD results generated by *LDWrap()*. Provide this filename if no "base" name is provided for `dataName` and you want to generate heat-maps for a specific set of phased LD values. e.g., `phasedData="phased-data.csv"` **unphasedData** Class: Character. Optional. Default="". The complete name of a file of unphased LD results generated by *LDWrap()*. Provide this filename if no "base" name is provided for `dataName` and you want to generate heat-maps for a specific set of unphased LD values. e.g., `phasedData="unphased-data.csv"` **phasedLabel** Class: Character. Required. Default="Phased". e.g., `phasedLabel="Pedigree Phased"` Specifies the label that should appear on the heat-map plots for the upper (presumed phased) half of the plot. **unphasedLabel** Class: Character. Required. Default="EM-estimated". e.g., `unphasedLabel="EM Haplotypes"` Specifies the label that should appear on the heat-map plots for the lower (presumed unphased) half of the plot. **color** Class: Logical. Required. Default=TRUE. e.g., `color=FALSE` Identifies if the heat-map plots should be generated in color (`color=TRUE`) or greyscale (`color=FALSE`). The default is `color=TRUE`. Color heat-map plots will range from blue (low LD values) to white (LD values of 0.5) to red (high LD values). Greyscale heat-map plots will range from dark grey (Low LD values) to light grey (high LD values). **writePlot** Class: Logical. Required. Default=FALSE. Identifies if the heat-map plots should be automatically saved after they are generated. **writeDir** Class: Character. Optional. Default=tempdir(). The directory into which the heat-map plots should be saved when `writePlot=TRUE`. The default is the directory specified by *tempdir()*. ## Examples ###cALD() ``` # Analyzing the included HLA-DRB1 HLA-DQB1 genotype data and reporting results in the console cALD(drb1.dqb1.demo) # Alternatively returning a vector of LD results, with nothing reported in the console LDvec <- cALD(drb1.dqb1.demo,verbose=FALSE) LDvec # Enforcing phase between columns 1 and 3 and between columns 2 and 4 for analysis cALD(drb1.dqb1.demo,inPhase=TRUE) # Writing the haplotype vector to a file in the temporary directory that will # have the time-stamped name DRB1~DQB1_haplotype_Vector_yyyy-MM-dd_HH-mm-ss.txt cALD(drb1.dqb1.demo,saveVector = TRUE) # Writing the haplotype vector to a file named "foo.txt" in the temporary directory cALD(drb1.dqb1.demo,saveVector = TRUE,vectorName = "foo") # Writing the haplotype vector to a file in the temporary directory with the prefix "foo_" cALD(drb1.dqb1.demo,saveVector = TRUE,vectorPrefix = "foo") # Writing the haplotype vector to a file in the working directory cALD(drb1.dqb1.demo,saveVector = TRUE,vecDir = getwd()) ``` ###LDWrap() ``` # Analyzing the included HLA haplotype data # This will create 15 haplotype vector files and one LD results file in the temporary directory LDWrap(hla.hap.demo) # Specifying the prefix "foo_Phased" for the LD results file, and "foo_phased" for the haplotype vector files LDWrap(hla.hap.demo,frameName = "foo") # Truncating the alleles in hla.hap.demo to 1 field for analysis. LDWrap(hla.hap.demo,frameName = "foo", trunc=1) # Analyzing the included HLA genotype data LDWrap(drb1.dqb1.demo,frameName="hla-genotype-data") # Writing the resulting files to the working directory LDWrap(hla.hap.demo,writeTo = getwd()) ``` ###LD.sign.test() ``` # Generating LDWrap() results files for the example data included with this package in the temporary directory LDWrap(hla.hap.demo) LDWrap(hla.hap.demo,phased=FALSE) # Analyzing the results files generated by LDWrap(), with a CSV of the results written to the temporary directory. LDdata <- paste(tempdir(),"hla-family-data",sep=.Platform$file.sep) LD.sign.test(LDdata, returnFrame=FALSE) # Returning only a data frame for the same analysis. LD.res <- LD.sign.test(LDdata,verbose=FALSE) View(LD.res) # Writing the CSV file to the working directory LD.sign.test(LDdata,returnFrame = FALSE,resultDir = getwd()) ``` ###LD.heat.map() ``` # Generating LDWrap() results files for the example data included with this package in the working directory LDWrap(hla.hap.demo, writeTo=getwd()) LDWrap(hla.hap.demo,phased=FALSE, writeTo=getwd()) # Generating color heat-map plots based on the LD result files in the working directory LD.heat.map("hla-family-data") # Generating greyscale heat-map plots based on the LD result files in the working directory LD.heat.map("hla-family-data",color=FALSE) # Generating heat-map plots for phased data alone based on the LD result files in the working directory LD.heat.map(phasedData="hla-family-data_Phased_LD_results.csv",unphasedLabel="") # Writing color PNG-formatted heat-map plots to the working directory, using the LD files in the working directory LD.heat.map("hla-family-data",writePlot = TRUE,writeDir = getwd()) ``` ## End of vignette.