Skip to contents

Performs a hybrid bootstrapping approach to construct quantile based confidence intervals around the original lasso/MCP/SCAD estimator. Specifically, a traditional pairs bootstrap is performed with 1 adjustment: if the bootstrap sample for a given covariate is zero, a random sample from the full conditional posterior is used as the bootstrap sample instead. This avoids the creation of intervals with endpoints exactly equal to zero.

Usage

boot_ncvreg(
  X,
  y,
  fit,
  lambda,
  sigma2,
  cluster,
  seed,
  nboot = 1000,
  penalty = "lasso",
  level = 0.95,
  gamma = switch(penalty, SCAD = 3.7, 3),
  alpha = 1,
  returnCV = FALSE,
  return_boot = FALSE,
  verbose = FALSE,
  ...
)

Arguments

X

The design matrix, without an intercept. boot_ncvreg standardizes the data and includes an intercept by default.

y

The response vector.

fit

(optional) An object of class ncvreg or cv.ncvreg. An object of class ncvreg provides data, penalty choices, and lambda sequence to boot_ncvreg. An object of class cv.ncvreg can in addition can provide information for selecting lambda and estimating sigma2. If provided, y should not be provided and X should only be provided if fit does not contain X.

lambda

(optional) The value of lambda to provide interval estimates for. If left missing will be selected using CV. If user wants to set the lambda sequence used to select lambda via cross validation, they should call cv.ncvreg separately and pass the resulting object to fit.

sigma2

(optional) The variance to use for the Hybrid sampling. If left missing will be set using the estimator suggested by Reid et. al. (2016) using CV.

cluster

Bootstrapping and cv.ncvreg (if applicable) can be run in parallel across a cluster using the parallel package. The cluster must be set up in advance using the parallel::makeCluster() function from that package. The cluster must then be passed to boot_ncvreg.

seed

You may set the seed of the random number generator in order to obtain reproducible results. This is set for the overall process. If the user wishes to set a seed specifically for cv.ncvreg they should call it separately then pass the fitted object as an argument to fit.

nboot

The number of bootstrap replications to use.

penalty

The penalty to be applied to the model. Either "lasso" (the default), "MCP", or "SCAD".

level

The confidence level required.

gamma

The tuning parameter of the MCP/SCAD penalty (see ncvreg for details). Default is 3 for MCP and 3.7 for SCAD. Ignored if fit is provided.

alpha

Tuning parameter for the Elastc net estimator which controls the relative contributions from the lasso/MCP/SCAD penalty and the ridge, or L2 penalty. alpha=1 is equivalent to lasso/MCP/SCAD penalty, while alpha=0 would be equivalent to ridge regression. However, alpha=0 is not supported; alpha may be arbitrarily small, but not exactly 0. Ignored if fit is provided.

returnCV

If TRUE, the cv.ncvreg fit will be returned (if applicable).

return_boot

If TRUE, the bootstrap draws will be returned.

verbose

If FALSE, non-essential messages are suppressed.

...

named arguments to be passed to ncvreg and cv.ncvreg.

Value

A list with:

confidence_intervals

A data.frame with the original point estimates along with lower and upper bounds of Hybrid CIs.

lambda

The value of lambda the confidence_intervals were constructed at.

sigma2

The value of sigma2 used for the Hybrid bootstrap sampling.

penalty

The penalty the intervals correspond to.

alpha

The tuning parameter for the Enet estimator used.

level

The confidence level the intervals correspond to.

If a penalty other than "lasso" is used,

gamma

The tuning parameter for MCP/SCAD penalty.

If returnCV is TRUE and a cv.ncvreg object was fit or supplied

cv.ncvreg

The cv.ncvreg fit used to estimate lambda and sigma2 (if applicable).

If return_boot is TRUE

boot_draws

A data.frame of the Hybrid bootstrap draws are returned.

Details

The resulting intervals WILL NOT have exact nominal coverage for all covariates. They are instead constructed in a way that overall coverage will be approximately equal to nominal so long as the true distribution of betas is Laplace and the covariates are independent. That said, in practice, average coverage is fairly robust to these assumptions.

Note: Draws from the full conditional posterior are approximations for MCP/SCAD or when alpha is not 1.

Examples

data(Prostate)
X <- Prostate$X
y <- Prostate$y
boot_ncvreg(X, y, level = 0.8)
#> $confidence_intervals
#>            estimates       lower         upper
#> lcavol   0.562243689  0.45981106  0.6567666354
#> lweight  0.620231684  0.32127116  0.9009464310
#> age     -0.020914110 -0.03347420 -0.0078682197
#> lbph     0.095850091  0.02135000  0.1803362582
#> svi      0.755631635  0.46751421  1.0634167004
#> lcp     -0.101882411 -0.21771903 -0.0002548175
#> gleason  0.048094456 -0.12319034  0.2360152589
#> pgg45    0.004379408 -0.00109211  0.0105023104
#> 
#> $lambda
#> [1] 0.0008434274
#> 
#> $sigma2
#> [1] 0.4838195
#> 
#> $penalty
#> [1] "lasso"
#> 
#> $alpha
#> [1] 1
#> 
#> $level
#> [1] 0.8
#>