Performs a hybrid bootstrapping approach to construct quantile based confidence intervals around the original lasso/MCP/SCAD estimator. Specifically, a traditional pairs bootstrap is performed with 1 adjustment: if the bootstrap sample for a given covariate is zero, a random sample from the full conditional posterior is used as the bootstrap sample instead. This avoids the creation of intervals with endpoints exactly equal to zero.
Usage
boot_ncvreg(
X,
y,
fit,
lambda,
sigma2,
cluster,
seed,
nboot = 1000,
penalty = "lasso",
level = 0.95,
gamma = switch(penalty, SCAD = 3.7, 3),
alpha = 1,
returnCV = FALSE,
return_boot = FALSE,
verbose = FALSE,
...
)
Arguments
- X
The design matrix, without an intercept.
boot_ncvreg
standardizes the data and includes an intercept by default.- y
The response vector.
- fit
(optional) An object of class
ncvreg
orcv.ncvreg
. An object of classncvreg
provides data, penalty choices, andlambda
sequence toboot_ncvreg
. An object of classcv.ncvreg
can in addition can provide information for selectinglambda
and estimatingsigma2
. If provided,y
should not be provided andX
should only be provided iffit
does not containX
.- lambda
(optional) The value of lambda to provide interval estimates for. If left missing will be selected using CV. If user wants to set the lambda sequence used to select
lambda
via cross validation, they should callcv.ncvreg
separately and pass the resulting object tofit
.- sigma2
(optional) The variance to use for the Hybrid sampling. If left missing will be set using the estimator suggested by Reid et. al. (2016) using CV.
- cluster
Bootstrapping and
cv.ncvreg
(if applicable) can be run in parallel across a cluster using the parallel package. The cluster must be set up in advance using theparallel::makeCluster()
function from that package. The cluster must then be passed toboot_ncvreg
.- seed
You may set the seed of the random number generator in order to obtain reproducible results. This is set for the overall process. If the user wishes to set a seed specifically for
cv.ncvreg
they should call it separately then pass the fitted object as an argument tofit
.- nboot
The number of bootstrap replications to use.
- penalty
The penalty to be applied to the model. Either "lasso" (the default), "MCP", or "SCAD".
- level
The confidence level required.
- gamma
The tuning parameter of the MCP/SCAD penalty (see
ncvreg
for details). Default is 3 for MCP and 3.7 for SCAD. Ignored if fit is provided.- alpha
Tuning parameter for the Elastc net estimator which controls the relative contributions from the lasso/MCP/SCAD penalty and the ridge, or L2 penalty.
alpha=1
is equivalent to lasso/MCP/SCAD penalty, whilealpha=0
would be equivalent to ridge regression. However,alpha=0
is not supported;alpha
may be arbitrarily small, but not exactly 0. Ignored if fit is provided.- returnCV
If
TRUE
, thecv.ncvreg
fit will be returned (if applicable).- return_boot
If
TRUE
, the bootstrap draws will be returned.- verbose
If
FALSE
, non-essential messages are suppressed.- ...
named arguments to be passed to
ncvreg
andcv.ncvreg
.
Value
A list with:
- confidence_intervals
A
data.frame
with the original point estimates along with lower and upper bounds of Hybrid CIs.- lambda
The value of
lambda
theconfidence_intervals
were constructed at.- sigma2
The value of
sigma2
used for the Hybrid bootstrap sampling.- penalty
The penalty the intervals correspond to.
- alpha
The tuning parameter for the Enet estimator used.
- level
The confidence level the intervals correspond to.
If a penalty other than "lasso" is used,
- gamma
The tuning parameter for MCP/SCAD penalty.
If returnCV
is TRUE
and a cv.ncvreg
object was fit or supplied
- cv.ncvreg
The
cv.ncvreg
fit used to estimatelambda
andsigma2
(if applicable).
If return_boot
is TRUE
- boot_draws
A
data.frame
of the Hybrid bootstrap draws are returned.
Details
The resulting intervals WILL NOT have exact nominal coverage for all covariates. They are instead constructed in a way that overall coverage will be approximately equal to nominal so long as the true distribution of betas is Laplace and the covariates are independent. That said, in practice, average coverage is fairly robust to these assumptions.
Note: Draws from the full conditional posterior are approximations for
MCP/SCAD or when alpha
is not 1.
Examples
data(Prostate)
X <- Prostate$X
y <- Prostate$y
boot_ncvreg(X, y, level = 0.8)
#> $confidence_intervals
#> estimates lower upper
#> lcavol 0.562243689 0.45981106 0.6567666354
#> lweight 0.620231684 0.32127116 0.9009464310
#> age -0.020914110 -0.03347420 -0.0078682197
#> lbph 0.095850091 0.02135000 0.1803362582
#> svi 0.755631635 0.46751421 1.0634167004
#> lcp -0.101882411 -0.21771903 -0.0002548175
#> gleason 0.048094456 -0.12319034 0.2360152589
#> pgg45 0.004379408 -0.00109211 0.0105023104
#>
#> $lambda
#> [1] 0.0008434274
#>
#> $sigma2
#> [1] 0.4838195
#>
#> $penalty
#> [1] "lasso"
#>
#> $alpha
#> [1] 1
#>
#> $level
#> [1] 0.8
#>