Penalties

grpreg fits models that fall into the penalized likelihood framework, in which we estimate \boldsymbol{\beta} by minimizing the objective function

Q(\boldsymbol{\beta}|\mathbf{X}, \mathbf{y}) = L(\boldsymbol{\beta}|\mathbf{X},\mathbf{y}) + P_\lambda(\boldsymbol{\beta}),

where L(\boldsymbol{\beta}|\mathbf{X},\mathbf{y}) is the loss (deviance) and P_\lambda(\boldsymbol{\beta}) is the penalty. This article describes the different penalties available in grpreg; see models for more information on the different loss functions available.

The following notation is used throughout (recall that the design matrix \mathbf{X} is decomposed into groups \mathbf{X}_1, \mathbf{X}_2, \ldots:

\boldsymbol{\beta} denotes the entire vector of regression coefficients
\boldsymbol{\beta}_j denotes the vector of regression coefficients corresponding to the jth group
\beta_{jk} denotes kth regression coefficient in the jth group
\lVert\boldsymbol{\beta}_j\rVert_2 denotes the Euclidean (L_2) norm of \boldsymbol{\beta}_j: \lVert x\rVert_2 = \sqrt{x_1^2 + x_2^2 + \ldots}
\lVert\boldsymbol{\beta}_j\rVert_1 denotes the L_1 norm of \beta_j: \lVert x\rVert_1 = \left\lvert x_1\right\rvert + \left\lvert x_2\right\rvert + \ldots

Group selection

These penalties are sparse at the group level – the coefficients within a group will either all equal zero or none will equal zero.

If you use any of these penalties, please cite

Breheny P and Huang J (2015). Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and Computing, 25: 173-187. [pdf].

The article goes into more mathematical details, discusses issues of standardization in the group sense, and provides references.

The group lasso was originally proposed in

Yuan M. and Lin Y. (2006) Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B, 68: 49-67.

Group lasso

grpreg(X, y, group, penalty="grLasso")

P(\beta) = \lambda\sum_j \lVert\boldsymbol{\beta}_j\rVert_2

Group MCP

grpreg(X, y, group, penalty="grMCP")

P(\boldsymbol{\beta}) = \sum_j \textrm{MCP}_{\lambda, \gamma}(\lVert\boldsymbol{\beta}_j\rVert_2)

where \textrm{MCP}_{\lambda, \gamma}(\cdot) denotes the MCP penalty with regularization parameter \lambda and tuning parameter \gamma.

Group SCAD

grpreg(X, y, group, penalty="grSCAD")

P(\boldsymbol{\beta}) = \sum_j \textrm{SCAD}_{\lambda, \gamma}(\lVert\boldsymbol{\beta}_j\rVert_2)

where \textrm{SCAD}_{\lambda, \gamma}(\cdot) denotes the SCAD penalty with regularization parameter \lambda and tuning parameter \gamma.

Bi-level selection

These penalties are sparse at both the group and individual levels. In some groups, all coefficients will equal zero. However, even if a group is selected, some of the coefficients within that group may still be zero.

Group exponential lasso (GEL)

grpreg(X, y, group, penalty="gel")

P(\beta) = \sum_j f_{\lambda, \tau}(\lVert\boldsymbol{\beta}_j\rVert_1)

where f(\cdot) denotes the exponential penalty with regularization parameter \lambda and tuning parameter \tau:

f_{\lambda, \tau}(\theta) = \frac{\lambda^2}{\tau}\left\{1-\exp\left(-\frac{\tau\theta}{\lambda}\right)\right\}

If you use the GEL penalty, please cite

Breheny P (2015). The group exponential lasso for bi-level variable selection. Biometrics, 71: 731-740. [pdf].

Composite MCP

grpreg(X, y, group, penalty="cMCP")

P(\boldsymbol{\beta}) = \sum_j \textrm{MCP}_{\lambda, \gamma_1} \left( \sum_k \textrm{MCP}_{\lambda, \gamma_2} (\left\lvert\beta_{jk}\right\rvert) \right)

where \textrm{MCP}_{\lambda, \gamma}(\cdot) denotes the MCP penalty with regularization parameter \lambda and tuning parameter \gamma.

If you use the composite MCP penalty, please cite either of the following papers:

Breheny P and Huang J (2009). Penalized methods for bi-level variable selection. Statistics and Its Interface, 2: 369-380. [pdf]
Huang J, Breheny P and Ma S (2012). A selective review of group selection in high-dimensional models. Statistical Science, 27: 481-499. [pdf]

Please note that there is some confusion around the name “group MCP”. In the first paper above (2009), the composite MCP penalty was referred to as the “group MCP” penalty; the second paper (2012), in reviewing the various kinds of group penalties that had been proposed, recommended changing the name to “composite MCP” to avoid confusion with the “group MCP” defined above.

Group bridge

gBridge(X, y, group)

P(\boldsymbol{\beta}) = \lambda \sum_j K_j^\gamma \lVert\boldsymbol{\beta}_j\rVert_1^\gamma

where K_j denotes the number of elements in group j.

Please note that the group bridge penalty uses a very different algorithm from the other penalties. Due to the nature of the penalty, model fitting is slower and less stable for group bridge models. This is, in fact, the main motivation of the GEL penalty of Section~: to offer a more tractable alternative to group bridge that has similar estimation properties but is much better behaved from a numerical optimization perspective.

If you use the group bridge penalty, please cite either of the following papers:

Huang J, Ma S, Xie H and Zhang C (2009). A group bridge approach for variable selection. Biometrika, 96: 339-355.
Breheny P and Huang J (2009). Penalized methods for bi-level variable selection. Statistics and Its Interface, 2: 369-380. [pdf]

The first paper proposed the method; the second paper proposed the algorithm that is used in the grpreg package.

Specifying an additional ridge component

For all of the penalties in the previous section, grpreg allows the specification of an additional ridge (L_2) component to the penalty. This will set \lambda_1 = \alpha\lambda and \lambda_2=(1-\alpha)\lambda, with the penalty given by

P(\boldsymbol{\beta}) = P_1(\boldsymbol{\beta}|\lambda_1) + \frac{\lambda_2}{2}\lVert\boldsymbol{\beta}\rVert_2^2,

where P_1 is any of the penalties from the earlier sections. So, for example

grpreg(X, y, group, penalty="grLasso", alpha=0.75)

will fit a model with penalty

P(\beta) = 0.75\lambda\sum_j \lVert\boldsymbol{\beta}_j\rVert_2 + \frac{0.25\lambda}{2}\lVert\boldsymbol{\beta}\rVert_2^2.

Patrick Breheny