Starting in version 3.4, grpreg offers an interface for setting up, fitting, and visualizing additive models. Numeric features are automatically expanded using spline basis functions. The basic idea was first proposed by Ravikumar et al. (2009), who called it SPAM, for sparse additive models. The original proposal involved the group lasso penalty, but any of grpreg’s penalty functions can be used instead. The basic usage is illustrated below.
Let’s start by generating some nonlinear data:
Data <- gen_nonlinear_data(n=1000) Data$X[1:5, 1:5] # V01 V02 V03 V04 V05 # [1,] 0.8776286 0.3969330 0.5961121 0.4575194 0.8364502 # [2,] 0.6678036 0.2843176 0.2561765 0.4477876 0.6467037 # [3,] 0.3923234 0.7491731 0.5052994 0.1677680 0.5653253 # [4,] 0.2085937 0.7791867 0.4974960 0.2472813 0.3532472 # [5,] 0.7870270 0.9652729 0.4444696 0.9986034 0.7478494 dim(Data$X) #  1000 16
Data$X contains 16 numeric features, named
V02, and so on. Each of those features can be expanded via the
X <- expand_spline(Data$X) X$X[1:5, 1:5] # V01_1 V01_2 V01_3 V02_1 V02_2 # [1,] 0.23208305 0.3578202 0.40188120 0.09184666 0.5384536 # [2,] 0.54924198 0.3365813 -0.05057158 -0.06567846 0.5052250 # [3,] 0.10949785 0.5335455 -0.36196307 0.51162509 0.3187681 # [4,] -0.09172775 0.4255439 -0.29051912 0.46691506 0.3208760 # [5,] 0.44158340 0.3230283 0.19200753 -0.02963445 0.4093286 dim(X$X) #  1000 48 head(X$group) #  "V01" "V01" "V01" "V02" "V02" "V02"
The resulting object is a list that contains the expanded matrix
X$X and the group assignments
X$group, along with some metadata needed by internal functions. Note that
X$X now contains 48 columns – each of the 16 numeric features (
V01) has been expanded into a 3-column matrix (
V01_3). By default,
expand_spline() uses natural cubic splines with three degrees of freedom, but consult its documentation for additional options.
This expanded matrix can now be passed to
fit <- grpreg(X, Data$y)
Note that it is not necessary to pass grouping information in this case, as it is contained with the
X object. At this point, all of the usual tools
predict(), etc., can be used, as well as
plot.grpreg(). However, grpreg also offers a function,
plot_spline(), specific to additive models:
plot_spline(fit, "V02", lambda = 0.03)
Partial residuals can be included in these plots as well:
plot_spline(fit, "V02", lambda = 0.03, partial=TRUE)
By default, these plots are centered such that at the mean of \(x\) (where \(x\) denotes the feature being plotted), the \(y\) value is zero. Alternatively, if
type="conditional" is specified,
plot_spline() will construct a plot in which the vertical axis represents model predictions as \(x\) varies and all other features are fixed at their mean value:
plot_spline(fit, "V02", lambda = 0.03, partial=TRUE, type='conditional')
In comparing these two plots, note that the general contours are the same; the only difference is the value of the vertical axis. Here are the plots for the first 9 coefficients:
In the generating model, variables 3 and 4 had a linear relationship with the outcome, variables 1, 2, 5, and 6 had nonlinear relationships, and all other variables were unrelated. The sparse additive model has captured this nicely.
These tools work with cross-validation as one would expect (by default plotting the fit that minimizes cross-validation error):
Finally, these tools work with survival and glm models as well. Here, all plots are returned on the linear predictor scale, and the residuals are deviance residuals.