The objective of this module is to test for significant SNPs using a
joint approach. For this, we will use the penalizedLMM
R
package, which is available on GitHub. Once
again, we will begin by loading the necessary libraries:
library(data.table)
library(magrittr)
library(qqman)
library(snpStats)
library(dplyr)
# devtools::install_github("areisett/penalizedLMM")
library(penalizedLMM)
Next, we load the data. Start by loading the clinical data, as this has the outcome (coronary artery disease (‘CAD’)) we need for our models.
clinical <- fread("data/penncath.csv")
# str(clinical) # if you need to remind yourself of what is here
We also need to load the genetic data. For this section, we will work
with the quality controlled (QC’d) data from the SNPRelate
package (see module 1). We will also need the “.bim” file from the
original data for making plots. Finally, we need our design matrix \(X\) with no missing values (this is the
\(X\) we obtained by our imputation
procedures).
# load QC'd data:
qc_dat <- readRDS('rds/gwas-qc.rds')
# load the bim file
bim <- fread('data/penncath.bim')
# load design matrix of SNP data (you would need to complete the imputation module first)
X <- readRDS(file = "rds/fully_imputed_numeric.rds")
If you completed the population structure module, you should load the principal components as well - we need them for our analysis. If you did not work through the population structure module, you can skip this step.
# load principal components
PCs <- readRDS(file = "rds/PCs_base.rds") %>%
as.data.frame()
names(PCs) <- paste0("PC", 1:ncol(PCs))
For the joint analysis approach, we will consider hdl
as
the outcome. At this time, the functions in penalizedLMM
are limited to continuous (i.e. numeric) outcomes – we hope to expand
the functionality of this package to generalized modeling in the near
future.
First, we will impute the 92 missing values in hdl
using
the mean. While this may be simplistic, this approach is sufficient for
illustrating the joint analysis technique.
clinical$hdl_impute <- ifelse(is.na(clinical$hdl),
mean(clinical$hdl, na.rm = T),
clinical$hdl)
Now, we will construct a penalized linear mixed model using the MCP
penalty (the default in plmm
). The k
argument
allows us to use a singular value decomposition approach that decreases
computational time. For more details, refer to the
penalizedLMM::plmm()
documentation.
joint_model <- plmm(X = X,
y = clinical$hdl_impute,
k = 4)
# by default, our PLMM will evaluate 100 lambda values
# we can summarize the fit of this model at the 50th lambda value:
print(summary(joint_model, idx = 50))
# MCP-penalized regression model with n=1401, p=752676 at lambda=1.6396
# -------------------------------------------------
# The model converged
# -------------------------------------------------
# # of non-zero coefficients: 1
# -------------------------------------------------
This section is under development