I’ve made a macro to estimate restricted cubic spline (RCS) basis in SPSS. Splines are useful tools to model non-linear relationships. Splines are useful exploratory tools to model non-linear relationships by transforming the independent variables in multiple regression equations. See Durrleman and Simon (1989) for a simple intro. I’ve largely based my implementation around the various advice Frank Harell has floating around the internet (see the `rcspline`

function in his HMisc R package), although I haven’t read his book (yet!!).

So here is the SPSS MACRO, and below is an example of its implementation. It takes either an arbitrary number of knots, and places them at the default locations according to quantiles of x’s. Or you can specify the exact locations of the knots. RCS need at least three knots, because they are restricted to be linear in the tails, and so will return *k – 2* bases (where k is the number of knots). Below is an example of utilizing the default knot locations, and a subsequent plot of the 95% prediction intervals and predicted values superimposed on a scatterplot.

```
FILE HANDLE macroLoc /name = "D:\Temp\Restricted_Cubic_Splines".
INSERT FILE = "macroLoc\MACRO_RCS.sps".
*Example of there use - data example taken from http://www-01.ibm.com/support/docview.wss?uid=swg21476694.
dataset close ALL.
output close ALL.
SET SEED = 2000000.
INPUT PROGRAM.
LOOP xa = 1 TO 35.
LOOP rep = 1 TO 3.
LEAVE xa.
END case.
END LOOP.
END LOOP.
END file.
END INPUT PROGRAM.
EXECUTE.
* EXAMPLE 1.
COMPUTE y1=3 + 3*xa + normal(2).
IF (xa gt 15) y1=y1 - 4*(xa-15).
IF (xa gt 25) y1=y1 + 2*(xa-25).
GRAPH
/SCATTERPLOT(BIVAR)=xa WITH y1.
*Make spline basis.
*set mprint on.
!rcs x = xa n = 4.
*Estimate regression equation.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10) CIN(95)
/NOORIGIN
/DEPENDENT y1
/METHOD=ENTER xa /METHOD=ENTER splinex1 splinex2
/SAVE PRED ICIN .
formats y1 xa PRE_1 LICI_1 UICI_1 (F2.0).
*Now I can plot the observed, predicted, and the intervals.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=xa y1 PRE_1 LICI_1 UICI_1
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: xa=col(source(s), name("xa"))
DATA: y1=col(source(s), name("y1"))
DATA: PRE_1=col(source(s), name("PRE_1"))
DATA: LICI_1=col(source(s), name("LICI_1"))
DATA: UICI_1=col(source(s), name("UICI_1"))
GUIDE: axis(dim(1), label("xa"))
GUIDE: axis(dim(2), label("y1"))
ELEMENT: area.difference(position(region.spread.range(xa*(LICI_1+UICI_1))), color.interior(color.lightgrey), transparency.interior(transparency."0.5"))
ELEMENT: point(position(xa*y1))
ELEMENT: line(position(xa*PRE_1), color(color.red))
END GPL.
```

See the macro for an example of specifying the knot locations. I also placed functionality to estimate the basis by groups (for the default quantiles). My motivation was partly to replicate the nice functionality of ggplot2 to make smoothed regression estimates by groups. I don’t know off-hand though if having different knot locations between groups is a good idea, so caveat emptor and all that jazz.

I presume this is still needed functionality in SPSS, but if this was not needed let me know in the comments. Other examples are floating around (see this technote and this Levesque example), but this is the first I’ve seen of implementing the restricted cubic splines.

## Thomas Patrick Bernardes

/ September 11, 2013Dear Andrew,

Thanks for the macro.

I’m curretly evaluating the usefulness of splines on one of my PhD projects and using a STATA syntax for that. Since I’m much more used to SPSS this really comes in handy.

I do have a question though. Since the graph takes advantage of the “ICIN” option for the regression, I cannot easily (to my knowledge) reproduce the same kind of graph for a logistic regression. With this, my question: is there a way to adapt the macro to run with logistic regressions?

Best Regards,

Thomas

## apwheele

/ September 11, 2013Hi Thomas,

The macro doesn’t have anything to do directly with making graphs. The macro just returns a set of separate variables that are the spline basis, and can be used as independent variables in any regression (be it linear or logistic or whatever).

To recreate a similar chart to what I show for OLS, you could use the GENLIN command to save the lower and upper confidence intervals (to recreate prediction intervals it looks like you will have to calculate your own – but the save commands have all the necessary ingredients). The scatter points just end up being less revealing because all of the points are at two locations on the Y axis (either 0 or 1). People often use jittering and/or rug plots in those circumstances.

If you have any other questions feel free. I figure a post on the type of chart viz. logistic regression would be a good topic and is somewhere in the to-do list, although who knows when I will have time to get to it.

## Clinton Brawner

/ May 10, 2014Dear Andrew,

Thanks for sharing this syntax. Do you know if it could be modified to work with Cox regression?

Best wishes.

-Clinton

## apwheele

/ May 10, 2014Yes it can be applied to any type of regression model. All the code does is make a set of basis functions, so it turns the original variable into several new variables. These new variables (plus the original variable) should be included on the right hand side for whatever regression equation you are using.

## Andreas Viberg

/ December 7, 2018Dear Andrew!

I have had great use of your RCS-MACRO in adjusting for age in my regression model, thank you!

I wonder if I can use it to estimate odds ratios for the contrast 25 and 75 percentile for the age (were the RCS-MACRO was used)? In my current project I use a Cox regression model.

Best wishes! /Andreas

## apwheele

/ December 10, 2018You could do the predicted OR at those two points and then conduct a test of the difference — I don’t see what benefits that has over drawing the plot of the OR (or predicted outcome I think is better) over the entire age range (which allows you to do the test by eye basically).

## LThomas

/ February 1, 2019I am a novice in statistical analysis. I am a clinician by training, getting into clinical outcomes research and have basic background in statistical analysis using SPSS but with minimal experience with macros and syntax. Is it possible for you to post a syntax example so show how to use this macro. I have a continuous predictor variable that I want to use this macro to transform into spline variables for a competing risk analysis and am hopelessly lost.

## apwheele

/ February 1, 2019That is what the blog post does! It is just:

FILE HANDLE macroLoc /name = “??!!File Handle to Syntax location!!??”.

INSERT FILE = “macroLoc\MACRO_RCS.sps”.

!rcs x = YourVariable n = 4.

OR something like:

!rcs x = YourVariable Loc = [4 10 15].

## PStratton

/ November 24, 2019Thanks so much for your macro. I am a clinician and have been trying very hard to modify your code to accept Cox regression, without success.

I am trying to display the HR of death on the Y axis vs. Age on the X-axis (to display how they interact).

The goal is for a plot similar to https://stackoverflow.com/questions/28385509/how-to-plot-a-cox-hazard-model-with-splines https://www.researchgate.net/figure/Fig-A1-Multivariate-Cox-proportional-hazards-regression-analysis-of-mortality-risk-with_fig2_321833341.

However, COXREG is not like your typical regression. The dependent variable of HR in this case requires time-to-event data (t_death) and event status (death = 1 or 0), for all or a certain portion of your data.

The R code for rcspline is given here https://rdrr.io/cran/Hmisc/src/R/rcspline.plot.s .

If there’s any way you’d be able to perhaps modify your code or provide an example with Cox Regression I’d be extremely indebted to you. Thank you.

## apwheele

/ November 25, 2019It will take alittle bit before I have time to do a full example, but one way is to save the outfile of the coefficient estimates (or a parameter file to score new variables), and then create the values to plot in a second dataset.

That just gets you the non-linear estimate of the hazard ratio though, you may do survival curves at different levels of the non-linear variable as well.

## Clary Foote

/ February 2, 2020As strange as this sounds, these macros are really giving me problems. I find myself bailing out TO SAS and just coding there. Because implementing the SPSS Macros I have zero clue to use once they involve doing anything. I’m very unclear how to use this macros or how to specify the variable to enter or the number of splines. I feel like I need a video.

## apwheele

/ February 2, 2020Step 1, download the sps file that contains the macro definition to your work computer, https://dl.dropboxusercontent.com/s/xndrv8vgx12v6gd/MACRO_RCS.sps?dl=0.

Step 2, use the INSERT command to import the macro definition. E.g.

INSERT FILE = “????????\MACRO_RCS.sps”

but you need to replace the question marks with the path for where ever you saved the sps file on your machine.

Step 3, call the macro on your variable. So if your variable is named say “Age”, the command would be:

!rcs x = Age n = 4.

This will then return two new variables, splinex1 and splinex2. This command by default uses Frank Harrell’s rules for different percentiles to define the knot locations. If instead you wanted to say set the knots to particular locations, then you would do it like this:

!rcs x = Age loc = [20 40 60 80].

And again that will give you two new variables splinex1 and splinex2.

That is about as simple as I can make it!

## Clary Foote

/ February 2, 2020Thanks for your expedient reponse. So regarding ‘Step 3, call the macro on your variable. So if your variable is named say “Age”, the command would be:

!rcs x = Age n = 4.’

How do I call it? Where do I go to enter this code? I can google it I’m sure but since I’m asking. Also I’m running SPSS 26. Thanks again. CJ

## apwheele

/ February 3, 2020You would type the command in the syntax window, here is a screenshot example.

https://lh3.googleusercontent.com/a5Zo7uXEWIyTiK3MsgQzCk8wyUT_yQCNQacaXtt6kp9uzIIKOQQhXQS-yRxUBmOByipmAN1I0rHBhOlVBq_ORVO-Rf5HYAxETuJJE7x8OF9HTCxckA-jBFBsSTCReXxpqU0pUdbqXDv_YFnCKf83VFRVCfRmflnVh4njvY6ERMqxOpUGXx8zbZf-AM4LaVEGiqoMVsb7tszYAFpDd9wDzG0-F70yn3_1ksy3UCpfxTqj2T5_sLdeFHGz1jeuMveJx-dmfz0KNfRQx6urozjasd87_ghNolKr9LTlYEKHjkz2Z4Qwqmqh-Os1Ep8wGn1TOJR8HTXq8oVDXL-6dox1KrTW9vbB3GosfbfLjTxcVBiEAonuDZI_wSEoSqyB-jpcHrSSPb5eRL6OMYr5TgxLaFTj6OSrGyW6WHIHvti_ChoE2Padq-JJCtzPnTQpyIBviLyWG4jxA2NOlW0jR89WEyJHa0yAs2IfC74_QURvTonuhSMYrhJusHFTZB434alJyHG7NZgqgUNKUETojmdHKcz4KmBXLNSjQtDUxbMBiPsSd6uTJ5SfT5wG7GbP-cToGgnRXw1hmBhoB8cHE_DaXCuHyvNjvkW6m3T-sopXGXJqrXApgfYgNTKR45mkNE7QCyfeIondve2FKOL9XIgqWcI_wSCoX-yCtZgryt68fEGdKmjcohrz4w=w1178-h345-no

## Clary J Foote

/ December 22, 2020Andrew,

When I use the macro I can run the model in GLM no problem with a single variable and its accompanying spline variable. However, if I load the model with other variables, it causes the graph to get very erratic. Can you advise how to do a multivariable model with more than a single variable that the spline variable is corresponding to? Thanks

## apwheele

/ December 22, 2020To make the graph, you need to set all of the other variables to some constant value and then plot the predicted fitted curve. (It doesn’t make sense to do that particular graph I show with the observed data points when you are using other variables, but it is fine to make a model with multiple independent variables.)

So you can do that using either the missing data trick, or saving the model file and scoring new data. (So if you need to do that, you should probably explicitly set the spline locations.)

## CJ

/ December 22, 2020So I set the spline locations as I have clinical data that would predict where I expect the “bending” to occur. I just want the predicted values to be adjusted for confounding. As you said you can have multiple variables. However, when I place the linear variable with the spline variable (with specific cutpoints set) into the model and ask it to run predicted probabilities, it is erratic. So I’m wondering I’m not clear on how to generate a spline graph adjusting for the confounders while including the “splined variable” and the the actual spline variable and then running the predicted probabilities along the “splined variable.” if that makes sense. Specifically, I’m looking at time to surgery, and I want to see how that predicts infection rates for certain open/compound fractures. However, there are MANY confounders. So if I can incorporate them into the spline graph, that’s what I’d like to do. Sounds like you have to set the confounders to a mean value to do this? CJ

## Clary J Foote

/ December 22, 2020What I’ve been doing is just restricting the dataset for certain big confounders values and running the predicted probability with time to surgery and the spline variable. Which generates a smooth beautiful curve. But it doesn’t allow me to really build a model and graph it, which is what I’d like to do.

## apwheele

/ December 23, 2020I will work on a new blog post illustrating how to do what I am saying. Should be able to finish that in the next day or two.

## apwheele

/ December 23, 2020There you go, here is a new blog post illustrating my advice. https://andrewpwheeler.com/2020/12/23/graphing-spline-predictions-in-spss/