The objective of this module is to test for significant SNPs using a joint approach. For this, we will use the penalizedLMM R package, which is available on GitHub. Once again, we will begin by loading the necessary libraries:

library(data.table)
library(magrittr)
library(qqman)
library(snpStats)
library(dplyr)

# devtools::install_github("areisett/penalizedLMM")
library(penalizedLMM)

Next, we load the data. Start by loading the clinical data, as this has the outcome (coronary artery disease (‘CAD’)) we need for our models.

clinical <- fread("data/penncath.csv")
# str(clinical) # if you need to remind yourself of what is here

We also need to load the genetic data. For this section, we will work with the quality controlled (QC’d) data from the SNPRelate package (see module 1). We will also need the “.bim” file from the original data for making plots. Finally, we need our design matrix \(X\) with no missing values (this is the \(X\) we obtained by our imputation procedures).

# load QC'd data:
qc_dat <- readRDS('rds/gwas-qc.rds')
# load the bim file
bim <- fread('data/penncath.bim')
# load design matrix of SNP data (you would need to complete the imputation module first)
X <- readRDS(file = "rds/fully_imputed_numeric.rds")

If you completed the population structure module, you should load the principal components as well - we need them for our analysis. If you did not work through the population structure module, you can skip this step.

# load principal components 
PCs <- readRDS(file = "rds/PCs_base.rds") %>%
  as.data.frame()

names(PCs) <- paste0("PC", 1:ncol(PCs))

Constructing the model

For the joint analysis approach, we will consider hdl as the outcome. At this time, the functions in penalizedLMM are limited to continuous (i.e. numeric) outcomes – we hope to expand the functionality of this package to generalized modeling in the near future.

First, we will impute the 92 missing values in hdl using the mean. While this may be simplistic, this approach is sufficient for illustrating the joint analysis technique.

clinical$hdl_impute <- ifelse(is.na(clinical$hdl),
                                  mean(clinical$hdl, na.rm = T),
                                  clinical$hdl)

Now, we will construct a penalized linear mixed model using the MCP penalty (the default in plmm). The k argument allows us to use a singular value decomposition approach that decreases computational time. For more details, refer to the penalizedLMM::plmm() documentation.

joint_model <- plmm(X = X,
                    y = clinical$hdl_impute,
                    k = 4)

# by default, our PLMM will evaluate 100 lambda values
# we can summarize the fit of this model at the 50th lambda value:
print(summary(joint_model, idx = 50))
# MCP-penalized regression model with n=1401, p=752676 at lambda=1.6396
# -------------------------------------------------
# The model converged 
# -------------------------------------------------
# # of non-zero coefficients:  1 
# -------------------------------------------------

Examining the results

This section is under development