Package Version: 1.0.1

Overview

The pould package (pronounced “pooled”) calculates four linkage disequilibrium (LD) statistics – D^’, W_n and the conditional asymmetric LD (cALD) measures, W_A/B and W_B/A, for genotype data from pairs of genetic loci, and can treat these data as either phased or unphased for these calculations. The package includes a wrapper function that parses either column-formatted genotypes or multi-locus haplotypes in the 17th International HLA and Immunogenetics Workshop (IHIW) Family Haplotype Project’s HaplObserve output format. This wrapper function generates output files in a user-defined directory.

The package includes a function that applies a sign test to LD values for phased and unphased haplotypes generated by the wrapper function for haplotype- or genotype-formatted datasets, and a function that generates heat-maps for each LD measure. For more information, see:
Osoegawa et al. Hum Immunol. 2019;80(9):633-643.
Osoegawa et al. Hum Immunol. 2019;80(9):644-660.

For information about these LD measures, see:
Hedrick PW. Genetics; 1987,117,331-41.
Cramér H. Mathematical Models of Statistics, 1946, Princeton University Press, Princeton NJ.
Thomson G, Single RM. Genetics. 2014;198(1):321-31.

Pould accepts genotype and haplotype data for individual subjects as input data. To calculate the cALD measures using haplotype frequency data, try the asymLD package. See: Single et al. Hum Immunol. 2016;77(3):288-294 for more information about asymLD.

Functions and Input Formats

cALD

The cALD() function calculates the LD measures from genotype data for pairs of loci. The drb1.dqb1.demo dataset, shown below, illustrates the input format. The function accepts a 4-column data frame or tab-delimited text file with data for the first locus in columns 1 and 2, and data for the second locus in columns 3 and 4.

> drb1.dqb1.demo[1:6,]

##    DRB1  DRB1  DQB1  DQB1
## 1 15:01 07:01 06:02 03:03
## 2 04:05 13:01 03:02 06:03
## 3 04:01 13:02 03:02 06:09
## 4 16:02 04:05 05:02 03:02
## 5 15:01 07:01 06:02 02:02
## 6 04:01 04:01 03:02 03:02

The locus names should be used as column headers, with one allele/variant in each column. Each row represents a single subject. The headers for columns 1 and 2 must be identical, as must the headers for columns 3 and 4. If phase is known, cALD() assumes that columns 1 and 3 represent one haplotype, and that columns 2 and 4 represent the second haplotype.

While HLA data are shown above, cALD() can accept any genetic data. The example below combines data for HLA-DQA1 and rs7743506.

      HLA-DQA1       HLA-DQA1     rs7743506     rs7743506
        02:01          01:02           C             C
        04:01          01:02           A             C
        04:01          05:01           A             C
        01:01          01:02           C             C
        04:01          01:01           A             C
        05:01          01:02           C             C

LDWrap

The LDWrap() function parses either genotype data formatted in a two-column per locus format, or haplotype data formatted using the 17th International HLA and Immunogenetic Workshop Family Haplotype Project’s GL String-based format. The function accepts a data frame, a tab-formatted (.txt or .tsv) columnar genotype file, or a comma-separated value formatted (.csv) GL String haplotype file, and passes genotype data for all pairs of loci in that dataset to cALD() for LD analysis.

Input Formats

Haplotype Data

A minimal LDWrap() haplotype data file or data frame contains two columns named “Relation” and “Gl String”. Other columns are allowed, but are ignored. The hla.hap.demo dataset (shown below in edited form) illustrates the input format. Each row contains data for a single subject. The “Relation” column can contain any text string; however, values such as “mother”, “father” and “child” are standard for the Family HLA Data Project. LDWrap() will ignore all rows in which Relation=child; rows with any other value in the “Relation” column will be processed.

Relation    Gl String
Subject HLA-A*02:01~HLA-C*07:02~HLA-B*07:02+HLA-A*01:01~HLA-C*06:02~HLA-B*57:01
Subject HLA-A*03:01~HLA-C*07:01~HLA-B*49:01+HLA-A*01:01~HLA-C*07:01~HLA-B*08:01
Subject HLA-A*11:01~HLA-C*04:01~HLA-B*15:01+HLA-A*03:01~HLA-C*08:02~HLA-B*14:02
Subject HLA-A*68:01~HLA-C*15:02~HLA-B*40:06+HLA-A*68:01~HLA-C*06:02~HLA-B*45:01

The “Gl String” column contains a GL String formatted multi-locus HLA haplotype. In GL String format, the ~ operator denotes phase, and the + operator denotes copies of genes (in this case diploidy). While GL Strings can be used to describe ambiguous alleles and genotypes using the / and | operators, ambiguous data cannot be included in an LDWrap() data file. LDWrap() requires that alleles be described as LOCUS*VARIANT (e.g., HLA-DRB1*01:01); locus prefixes (e.g., HLA-) are not required, but if locus refixes are included, all loci must be described using the same prefix. Allele data described without a locus (e.g., 01:01) are not allowed. Unusual allele names (HLA-A*NULL, HLA-DRB1*NoMatch, HLA-DPB1*NT) and truncated versions of allele names (HLA-A*01, HLA-A*01:01, HLA-A*01:01:01, etc.) will be analyzed as distinct alleles, and may skew analytic results. LDWrap() includes an option to truncate colon-delimited allele names to specific numbers of fields for analysis.

Genotype Data

A minimal LDWrap() genotype data file or data frame contains two columns per locus, with one allele in each column, as for cALD(), but accommodating more than two loci. Columns for the the same locus must be adjacent and can have identical names, or can be suffixed with “_1” and “_2”. Columns named “SampleID” and “Disease” are permitted, but not required. No other columns are allowed. Allele names in each column can include a locus name (e.g., locus*allele); if locus names are not included, the locus name in the header will be associated with each allele in that column.

Locus Order

The order in which the loci appear, either in columns or in the GL String haplotype, affects the identification of haplotype locus pairs for analysis. For example, if HLA loci are organized alphabetically, haplotypes of the HLA-B and HLA-C loci will be analyzed as B~C haplotypes; if they are organized by map order, those haplotypes will be analyzed as C~B haplotypes. B~C and C~B haplotypes will not be recognized as the same by LD.heat.map(). To avoid this, it is recommended to use the same organization of loci for all analyses, and to use map order for HLA or KIR loci.

LD.sign.test

The LD.sign.test() function applies the R Stats Package’s binom.test() function to pairs of LD values (D’, Wn, WLoc1/Loc2, WLoc2/Loc1), as well as the number of haplotypes, for phased and unphased haplotypes. The *_LD_results.csv* files generated by LDWrap() are the input files for this function, and generally, LDWrap() must be used before this function can be applied. See the LDWrap() Outputs section below for an example.

LD.heat.map

The LD.heat.map() function generates heat-map plots of the LD values generated for each LD measure (D’, Wn, WLoc1/Loc2, and WLoc2/Loc1) for phased and unphased haplotpes. If LD values for only phased or unphased haplotpes are available, half-matrix heat-maps will be generated. The *_LD_results.csv* files generated by LDWrap() are the input files for this function, and generally, LDWrap() must be used before this function can be applied. See the LDWrap() Outputs section below for an example.

Outputs

cALD()

By default, cALD() operates in “verbose” mode, and will write five lines of output to the console describing the phase-status of the LD analysis (phased or unphased), the loci and number of haplotypes analyzed, and the four LD measures calculated, as shown below.

> cALD(drb1.dqb1.demo)
## Calculating D', Wn and conditional ALD for 53 unphased genotypes at the DRB1 and DQB1 loci.
## D' for DRB1~DQB1 haplotypes: 0.95892767844544 (0.9589) 
## Wn for DRB1~DQB1 haplotypes: 0.811250972337927 (0.8113) 
## Variation of DQB1 conditioned on DRB1 (WDQB1/DRB1) = 0.904035615838528 (0.904)
## Variation of DRB1 conditioned on DQB1 (WDRB1/DQB1) = 0.778712696009626 (0.7787)

When verbose=FALSE, cALD() returns a vector of D^’, W_n, W_B/A, W_A/B and the number of haplotypes, as below.

> cALDres <- cALD(drb1.dqb1.demo, verbose=FALSE)
> cALDres
## [1] "0.958463650196244" "0.811184752436694" "0.903300938910147" "0.778712697633606" "53"

In addition, when saveVector=TRUE, cALD() will write a text file, containing a vector of all haplotypes, their frequencies and counts for the analyzed locus pair, to a user-specified directory. Unless specified via the vecDir parameter, this file is written to the directory specified by tempdir(). This vector file also includes information on the dataset and phase status applied to the genotype data for the analysis. An example generated for the drb1.dqb1.demo dataset is shown below.

> cALD(drb1.dqb1.demo,saveVector = TRUE)

Haplotype vector file contents:

Dataset                                         Phase   DRB1~DQB1   Frequency           Count
DRB1~DQB1_haplotype_Vector_2018-05-02_16-53-12  FALSE   01:01~02:01 0                   0
DRB1~DQB1_haplotype_Vector_2018-05-02_16-53-12  FALSE   01:02~02:01 0                   0
DRB1~DQB1_haplotype_Vector_2018-05-02_16-53-12  FALSE   01:03~02:01 0                   0
DRB1~DQB1_haplotype_Vector_2018-05-02_16-53-12  FALSE   03:01~02:01 0.094272076372315   79
DRB1~DQB1_haplotype_Vector_2018-05-02_16-53-12  FALSE   04:01~02:01 0                   0
DRB1~DQB1_haplotype_Vector_2018-05-02_16-53-12  FALSE   04:02~02:01 0                   0
DRB1~DQB1_haplotype_Vector_2018-05-02_16-53-12  FALSE   04:03~02:01 0                   0
DRB1~DQB1_haplotype_Vector_2018-05-02_16-53-12  FALSE   04:04~02:01 0                   0
.
.
.

LDWrap

LDWrap() sends data to cALD(), captures the vector of LD results returned by cALD(), directs cALD() to write vectors of haplotypes for each locus pair into a user-specified directory, and writes a single table of LD results for all locus pairs to that same directory. As a single haplotype vector file is written for each locus pair, LDWrap() directs cALD() to write n(n-1)/2 haplotype vector files, where n is the number of loci in a haplotype. The only information LDWrap() returns to the console is a notification that the analysis has completed (LD Analysis Complete) and notifications that the provided dataset is missing the required columns. If the user specifies no destination for these files, they are written to the directory specified by tempdir().

When all locus pairs have been analyzed by cALD(), LDWrap() writes a six-column CSV file (*LD_results.csv) of aggregated LD result vectors collected from cALD() to the specified directory. The column headers in this file are Loc1~Loc2 (identifying the locus pair), D^’, W_n, W_B/A, W_A/B and N_Haplotypes. An example of this file is shown below.

> LDWrap(hla.hap.demo)
LD Analysis Complete

LD results file contents:

Loc1~Loc2,D',Wn,WLoc1/Loc2,WLoc2/Loc1,N_Haplotypes
A~C,0.469024805013898,0.362566555750013,0.366359427624652,0.384413960789992,191
A~B,0.540780240853345,0.446662593270748,0.36839918931955,0.471334711300434,241
A~DRB1,0.400002012804198,0.335434108343871,0.27413544158564,0.320399398449896,233
A~DRB3,Not Calculated,Subject Threshold=10,Complete subjects=8,.,
.
.
.

LDWrap() attemtps to peform these LD calculations for all pairs of loci in the LDWrap() datafile. If a haplotype dataset includes locus pairs for which the number of subjects is below the threshold value (see “Parameters”, below), the *LD_results.csv file will include rows for locus pairs for which no LD calculations were performed. As shown above, those rows contain data similar to that shown for the A~DRB3 haplotpe – Not Calculated, Subject Threshold=10, Complete subjects=8, ., and ’ ’.

As shown below, in cases where at least one locus in a pair is monomorphic, no LD calculations are performed and the pertinent rows in the *LD_results.csv file will contain, Not Calculated, Subject Threshold=10, Complete subjects=0, "locusName" is monomorphic., and ’ ’.

Loc1~Loc2,D',Wn,WLoc1/Loc2,WLoc2/Loc1,N_Haplotypes
B~DRB3,Not Calculated,Subject Threshold=10,Complete subjects=130,DRB3 is monomorphic.,
DRB3~DRB4,Not Calculated,Subject Threshold=10,Complete subjects=130,DRB3 is monomorphic. DRB4 is monomorphic.,
DRB3~DQA1,Not Calculated,Subject Threshold=10,Complete subjects=93,DRB3 is monomorphic.,

LD.sign.test

For each LD measure and the number of haplotypes for phased and unphased versions of the same genotype data, LD.sign.test() reports the p-value of the sign test, comparing the number of locus pairs for which the value of the measure is higher in unphased haplotypes than phased haplotypes to the number of locus pairs for which that value is lower or equal. The function also reports the total number of locus pairs evaluated, and the number of locus pairs with equal values. These data can be reported in three ways; as a returned data frame, as a table written to the console, and as a CSV file written to a user-specified directory. All three report formats are illustrated below. Note, only the significance of the sign test is reported; when a significant trend is indicated, the directionality of the trend is not reported.

> LD.res <- LD.sign.test("hla-family-data")
> View(LD.res)

Returned Data Frame

                             D'           Wn   WLoc1/Loc2   WLoc2/Loc1 N_Haplotypes
#unphased > phased 1.500000e+01 1.400000e+01 1.500000e+01 1.500000e+01 0.000000e+00
#unphased = phased 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
#locus pairs       1.500000e+01 1.500000e+01 1.500000e+01 1.500000e+01 1.500000e+01
p-values           6.103516e-05 9.765625e-04 6.103516e-05 6.103516e-05 6.103516e-05

> LD.sign.test("hla-family-data", returnFrame = FALSE)

Console Table

Sign Test results for the hla-family-data dataset for 15 locus pairs.
Measure         #U > P  #U = P    p-value   
D'              15      0         6.104e-05 
Wn              14      0         0.0009766 
WLoc1/Loc2      15      0         6.104e-05 
WLoc2/Loc1      15      0         6.104e-05 
# Haplotypes    0       0         6.104e-05

CSV File

,D',Wn,WLoc1/Loc2,WLoc2/Loc1,N_Haplotypes
#unphased > phased,15,14,15,15,0
#unphased = phased,0,0,0,0,0
#locus pairs,15,15,15,15,15
p-values,6.10351562500001e-05,0.0009765625,6.10351562500001e-05,6.10351562500001e-05,6.10351562500001e-05

LD.heat.map

The LD.heat.map() function generates heat-map plots for each LD measure, visualizing the range of LD values for each locus-pair analzed using LDWrap(). LD values for phased data are presented in the upper half of the plot matrix, while values for unphased data are presnted in the lower half of the plot matrix. Color (blue to white to red) or greyscale (dark grey to light grey) plots can be specified. When writePlot=TRUE, PNG-formatted LD plots will be written to a directory identified using the writeDir parameter; this parameter defaults to the directory specified by tempdir(). When the dataName parameter is provided, heat-map plots are written with the name <dataName>_<LD measure>_heatmap.png. When the phasedData and unphasedData parameters are provided, the heat-map plot for each LD measure is named <phasedData>-<unphasedData>_<LD measure>_heatmap.png. When no LDwrap() results are available, or when the loci differ between the datasets specified by phasedData and unphasedData, a notification is written to the console. There are no other outputs.

> LD.heat.map("family-data")

Console Notification

Files family-data_Phased_LD_results.csv and family-data_Unphased_LD_results.csv are not found.

> LD.heat.map(phasedData="my-data_phased.csv",unphasedData="my-data_unphased.csv")

Console Notification

The specified phased and unphased datafiles are not found.

> LD.heat.map(phasedData="my-ABC-data_phased.csv",unphasedData="my-ABDRB-data-unphased.csv")

Console Notification

Different loci included in my-ABC-data_phased.csv and my-ABDRB-data-unphased.csv. Cannot produce heatmaps.

Parameters

cALD

cALD(dataSet, inPhase = FALSE, verbose = TRUE, saveVector = FALSE, vectorName = "", vectorPrefix = "", vecDir = tempdir())

dataSet

Class: Character. Required. No Default.

e.g., dataSet="foo.txt" or dataSet=drb1.dqb1.demo

Identifies the genotype data to be analyzed. dataSet should identify a four-column data-frame or tab-delimited text file. Columns 1 and 2 contain genotype data for one genetic locus, with one allele in each column. Columns 3 and 4 contain genotype data for the second locus. The headers for columns 1 and 2 identify the first locus, and must be identical. Similarly, the headers for columns 3 and 4 identify the second locus, and must be identical.

inPhase

Class: Logical. Required. Default=FALSE.

Identifies how the genotype data should be analyzed. When inPhase=FALSE, the expectation-maximization (EM) algorithm is applied to the genotype data to generate estimated haplotypes. LD values are calculated for those EM haplotypes. When inPhase=TRUE, the genotype data are treated as phased, with the alleles in column 1 in phase with those in column 3, and the alleles in column 2 in phase with those in column 4. LD values are calculated for those phased haplotypes.

verbose

Class: Logical. Required. Default=TRUE.

Identifies how LD values should be reported. When verbose=TRUE, values for D^’, W_n, W_B/A, W_A/B and the number of haplotypes evaluated are written to the console in a human readable form. When verbose=FALSE, those values are reported in a vector of length 5 and mode of “character”.

saveVector

Class: Logical. Required. Default=FALSE.

Specifies if a file containing the haplotype vector for the analyzed locus pair should be exported. If saveVector=FALSE no vector is written. If saveVector=TRUE a tab-delimited text file consisting of five columns is written to the working directory. The file includes the columns “Dataset”, “Phase”, the name of the analyzed locus pair, “Frequency”, and “Count”; the latter two describe the frequency and count data for all possible haplptypes.

vectorName

Class: Character. Optional. No Default. (saveVector=TRUE specific).

Provides a name for the tab-delimited haplotype vector file written when saveVector=TRUE. When vectorName="foo" and saveVector=TRUE, the name of the haplotype vector file will be “foo.txt”. When vectorName="" and saveVector=TRUE, the name of the haplotype vector file will include the loci analyzed and a time-stamp, formatted as “Locus1~Locus2_haplotype_Vector_yyyy-MM-dd-HH-mm-ss.txt”.

vectorPrefix

Class: Character. Optional. No Default. (saveVector=TRUE specific).

Applies a prefix that includes the applied phase-status to the name of the tab-delimited haplotype vector file written when saveVector=TRUE, when vectorName="". If any text string is provided for vectorName, this parameter is ignored. E.g., when saveVector=TRUE, vectorName="", vectorPrefix="foo" and inPhase=FALSE, the haplotype vector file will be named “foo_unphased_Locus1~Locus2_haplotype_Vector_yyyy-MM-dd-HH-mm-ss.txt”.

When the LDWrap() function directs cALD() to write files containing haplotype vectors for each locus pair, LDWrap() provides the file information to cALD via the vectorPrefix parameter. The resulting files will contain the name of the family haplotype dataset processed by LDWrap(), the phase-status, haplotype pair-name, and a timestamp. If a positive trunc parameter was provided to LDWrap(), the truncation level will appear in the names of these haplotype vector files.

vecDir

Class: Character. Optional. Default=tempdir(). (saveVector=TRUE specific).

Specifies the directory into which the haplotype vector file should be written.

LDWrap

LDWrap(famData, threshold = 10, phased = TRUE, frameName = "hla-family-data")

famData

Class: Character. Required. No Default.

e.g., famData="foo.csv" or famData=hla.hap.demo

Identifies the haplotype or genotype dataset to be analyzed. For haplotype data, famData should identify a data frame or CSV file. This dataset must inlcude columns with the headers “Relation” and “Gl String”. If either (or both) of these column headers is not found in the dataset, or if the data file is not a CSV file, LDWrap() will halt the analysis with a notification about the missing header(s). For genotype data, famData should identify a data frame or tab-delimited text file with two columns of allele data for each locus. See the Functions and Input Formats section above for additional details about these dataset formats.

threshold

Class: Numeric. Required. Default=10.

Identifies the minimum number of subjects with haplotype data for a given locus pair required for the analysis of that locus pair. Analysis for that locus pair is not performed if the threshold is not met, and the LD results file will identify the threshold value and the number of subjects with data for that locus pair. If threshold is set to less than 1, it is automatically set to 1.

phased

Class: Logical. Required. Default=TRUE.

Specifies whether the haplotype data should be treated as phased (phased=TRUE) or unphased (phased=FALSE) for analysis.

frameName

Class: Character. Optional. Default=“hla-family-data”.

Provides a name that will be included in the names of the result files if famData specifies a data frame. The value of frameName is passed to cALD() as the vectorPrefix parameter. If famData specifies a file, frameName is ignored.

trunc

Class: Numeric. Required. Default=0.

Specifies the number of fields to which colon-delimited allele names in famdData should be truncated. The default value of 0 indicates no truncation. A value higher than the number of fields in the supplied allele data will result in no truncation. When a positive value of trunc is provided, the names of the output files will include the specified truncation level.

writeTo

Class: Character. Optional. Default=tempdir().

Specifies the directory into which LDWrap() should write files.

LD.sign.test

LD.sign.test(dataName,verbose = TRUE,returnFrame = FALSE)

dataName

Class: Character. Required. No Default.

e.g., dataName="foo"

The “base” name of the “_LD_result.csv” files generated by LDWrap(), with the “_Phased_LD_results.csv” or “_Unphased_LD_results.csv” suffixes removed. This corresponds to the value of the LDWrap() frameName parameter when the LDWrap() famData parameter does not specify a file; e.g., when specifying the “_LD_results.csv” files generated by LDWrap() for the hla.hap.demo data included with this package, dataName="hla-family-data". If the corresponding “_Phased_LD_results.csv” or “_Unphased_LD_results.csv” files are not found, the function will halt with a notification identifying the file(s) that are not found.

verbose

Class: Logical. Required. Default=TRUE.

Identifies if messages about function progress and results should be displayed in the console (verbose=TRUE) or not (verbose=FALSE). The default is verbose=TRUE. In addition to the table of results and messages about missing input files, messages regarding locus pairs for which no LD values are available, and about discrepancies between locus pairs with available data in the phased and unphased datasets are also suppressed with verbose=FALSE.

returnFrame

Class: Logical. Required. Default=TRUE.

Identifies if a data frame of results should be returned (returnFrame=TRUE). If returnFrame=FALSE, a CSV file of results named “_LD-sign-test_results.csv” is written in the directory specified by the resultDir parameter.

resultDir

Class: Character. Optional. Default=tempdir().

Specifies the directory into which LD.sign.test() should write the CSV file of results.

LD.heat.map

LD.heat.map(dataName)

dataName

Class: Character. Optional. Default=““.

e.g., dataName="foo"

The “base” name of the “_LD_result.csv” files generated by LDWrap(), with the “_Phased_LD_results.csv” or “_Unphased_LD_results.csv” suffixes removed. This corresponds to the value of the LDWrap() frameName parameter when the LDWrap() famData parameter does not specify a file; e.g., when specifying the “_LD_results.csv” files generated by LDWrap() for the hla.hap.demo data included with this package, dataName="hla-family-data". If the corresponding “_Phased_LD_results.csv” and “_Unphased_LD_results.csv” files are not found, the function will halt with a notification identifying the files that are not found. If this parameter is omitted, phasedData and/or unphasedData must be provided.

phasedData

Class: Character. Optional. Default=““.

The complete name of a file of phased LD results generated by LDWrap(). Provide this filename if no “base” name is provided for dataName and you want to generate heat-maps for a specific set of phased LD values.

e.g., phasedData="phased-data.csv"

unphasedData

Class: Character. Optional. Default=““.

The complete name of a file of unphased LD results generated by LDWrap(). Provide this filename if no “base” name is provided for dataName and you want to generate heat-maps for a specific set of unphased LD values.

e.g., phasedData="unphased-data.csv"

phasedLabel

Class: Character. Required. Default=“Phased”.

e.g., phasedLabel="Pedigree Phased"

Specifies the label that should appear on the heat-map plots for the upper (presumed phased) half of the plot.

unphasedLabel

Class: Character. Required. Default=“EM-estimated”.

e.g., unphasedLabel="EM Haplotypes"

Specifies the label that should appear on the heat-map plots for the lower (presumed unphased) half of the plot.

color

Class: Logical. Required. Default=TRUE.

e.g., color=FALSE

Identifies if the heat-map plots should be generated in color (color=TRUE) or greyscale (color=FALSE). The default is color=TRUE. Color heat-map plots will range from blue (low LD values) to white (LD values of 0.5) to red (high LD values). Greyscale heat-map plots will range from dark grey (Low LD values) to light grey (high LD values).

writePlot

Class: Logical. Required. Default=FALSE.

Identifies if the heat-map plots should be automatically saved after they are generated.

writeDir

Class: Character. Optional. Default=tempdir().

The directory into which the heat-map plots should be saved when writePlot=TRUE. The default is the directory specified by tempdir().

Examples

###cALD()

# Analyzing the included HLA-DRB1 HLA-DQB1 genotype data and reporting results in the console
cALD(drb1.dqb1.demo)
# Alternatively returning a vector of LD results, with nothing reported in the console
LDvec <- cALD(drb1.dqb1.demo,verbose=FALSE)
LDvec
# Enforcing phase between columns 1 and 3 and between columns 2 and 4 for analysis
cALD(drb1.dqb1.demo,inPhase=TRUE)
# Writing the haplotype vector to a file in the temporary directory that will 
# have the time-stamped name DRB1~DQB1_haplotype_Vector_yyyy-MM-dd_HH-mm-ss.txt
cALD(drb1.dqb1.demo,saveVector = TRUE)
# Writing the haplotype vector to a file named "foo.txt" in the temporary directory  
cALD(drb1.dqb1.demo,saveVector = TRUE,vectorName = "foo")
# Writing the haplotype vector to a file in the temporary directory with the prefix "foo_"
cALD(drb1.dqb1.demo,saveVector = TRUE,vectorPrefix = "foo")
# Writing the haplotype vector to a file in the working directory
cALD(drb1.dqb1.demo,saveVector = TRUE,vecDir = getwd())

###LDWrap()

# Analyzing the included HLA haplotype data 
# This will create 15 haplotype vector files and one LD results file in the temporary directory
LDWrap(hla.hap.demo)
# Specifying the prefix "foo_Phased" for the LD results file, and "foo_phased" for the haplotype vector files
LDWrap(hla.hap.demo,frameName = "foo")
# Truncating the alleles in hla.hap.demo to 1 field for analysis.
LDWrap(hla.hap.demo,frameName = "foo", trunc=1)
# Analyzing the included HLA genotype data
LDWrap(drb1.dqb1.demo,frameName="hla-genotype-data")
# Writing the resulting files to the working directory
LDWrap(hla.hap.demo,writeTo = getwd())

###LD.sign.test()

# Generating LDWrap() results files for the example data included with this package in the temporary directory
LDWrap(hla.hap.demo)
LDWrap(hla.hap.demo,phased=FALSE)
# Analyzing the results files generated by LDWrap(), with a CSV of the results written to the temporary directory.
LDdata <- paste(tempdir(),"hla-family-data",sep=.Platform$file.sep)
LD.sign.test(LDdata, returnFrame=FALSE)
# Returning only a data frame for the same analysis.
LD.res <- LD.sign.test(LDdata,verbose=FALSE)
View(LD.res)
# Writing the CSV file to the working directory
LD.sign.test(LDdata,returnFrame = FALSE,resultDir = getwd())

###LD.heat.map()

# Generating LDWrap() results files for the example data included with this package in the working directory
LDWrap(hla.hap.demo, writeTo=getwd())
LDWrap(hla.hap.demo,phased=FALSE, writeTo=getwd())
# Generating color heat-map plots based on the LD result files in the working directory
LD.heat.map("hla-family-data")
# Generating greyscale heat-map plots based on the LD result files in the working directory
LD.heat.map("hla-family-data",color=FALSE)
# Generating heat-map plots for phased data alone based on the LD result files in the working directory
LD.heat.map(phasedData="hla-family-data_Phased_LD_results.csv",unphasedLabel="")
# Writing color PNG-formatted heat-map plots to the working directory, using the LD files in the working directory
LD.heat.map("hla-family-data",writePlot = TRUE,writeDir = getwd())

End of vignette.

- Overview

pould: Phased Or Unphased Linkage Disequilibrium

Overview

Functions and Input Formats

cALD

LDWrap

Input Formats

Haplotype Data

Genotype Data

Locus Order

LD.sign.test

LD.heat.map

Outputs

cALD()

LDWrap

LD.sign.test

LD.heat.map

Parameters

cALD

LDWrap

LD.sign.test

LD.heat.map

Examples