# Restricted cubic splines in SPSS

I’ve made a macro to estimate restricted cubic spline (RCS) basis in SPSS. Splines are useful tools to model non-linear relationships. Splines are useful exploratory tools to model non-linear relationships by transforming the independent variables in multiple regression equations. See Durrleman and Simon (1989) for a simple intro. I’ve largely based my implementation around the various advice Frank Harell has floating around the internet (see the `rcspline` function in his HMisc R package), although I haven’t read his book (yet!!).

So here is the SPSS MACRO, and below is an example of its implementation. It takes either an arbitrary number of knots, and places them at the default locations according to quantiles of x’s. Or you can specify the exact locations of the knots. RCS need at least three knots, because they are restricted to be linear in the tails, and so will return k – 2 bases (where k is the number of knots). Below is an example of utilizing the default knot locations, and a subsequent plot of the 95% prediction intervals and predicted values superimposed on a scatterplot.

``````
FILE HANDLE macroLoc /name = "D:\Temp\Restricted_Cubic_Splines".
INSERT FILE = "macroLoc\MACRO_RCS.sps".

*Example of there use - data example taken from http://www-01.ibm.com/support/docview.wss?uid=swg21476694.
dataset close ALL.
output close ALL.
SET SEED = 2000000.
INPUT PROGRAM.
LOOP xa = 1 TO 35.
LOOP rep = 1 TO 3.
LEAVE xa.
END case.
END LOOP.
END LOOP.
END file.
END INPUT PROGRAM.
EXECUTE.
* EXAMPLE 1.
COMPUTE y1=3 + 3*xa + normal(2).
IF (xa gt 15) y1=y1 - 4*(xa-15).
IF (xa gt 25) y1=y1 + 2*(xa-25).
GRAPH
/SCATTERPLOT(BIVAR)=xa WITH y1.

*Make spline basis.
*set mprint on.
!rcs x = xa n = 4.
*Estimate regression equation.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10) CIN(95)
/NOORIGIN
/DEPENDENT y1
/METHOD=ENTER xa  /METHOD=ENTER splinex1 splinex2
/SAVE PRED ICIN .
formats y1 xa PRE_1 LICI_1 UICI_1 (F2.0).
*Now I can plot the observed, predicted, and the intervals.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=xa y1 PRE_1 LICI_1 UICI_1
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: xa=col(source(s), name("xa"))
DATA: y1=col(source(s), name("y1"))
DATA: PRE_1=col(source(s), name("PRE_1"))
DATA: LICI_1=col(source(s), name("LICI_1"))
DATA: UICI_1=col(source(s), name("UICI_1"))
GUIDE: axis(dim(1), label("xa"))
GUIDE: axis(dim(2), label("y1"))
ELEMENT: point(position(xa*y1))
ELEMENT: line(position(xa*PRE_1), color(color.red))
END GPL.
`````` See the macro for an example of specifying the knot locations. I also placed functionality to estimate the basis by groups (for the default quantiles). My motivation was partly to replicate the nice functionality of ggplot2 to make smoothed regression estimates by groups. I don’t know off-hand though if having different knot locations between groups is a good idea, so caveat emptor and all that jazz.

I presume this is still needed functionality in SPSS, but if this was not needed let me know in the comments. Other examples are floating around (see this technote and this Levesque example), but this is the first I’ve seen of implementing the restricted cubic splines.

1. #### Thomas Patrick Bernardes

/  September 11, 2013

Dear Andrew,

Thanks for the macro.

I’m curretly evaluating the usefulness of splines on one of my PhD projects and using a STATA syntax for that. Since I’m much more used to SPSS this really comes in handy.

I do have a question though. Since the graph takes advantage of the “ICIN” option for the regression, I cannot easily (to my knowledge) reproduce the same kind of graph for a logistic regression. With this, my question: is there a way to adapt the macro to run with logistic regressions?

Best Regards,
Thomas

• #### apwheele

/  September 11, 2013

Hi Thomas,

The macro doesn’t have anything to do directly with making graphs. The macro just returns a set of separate variables that are the spline basis, and can be used as independent variables in any regression (be it linear or logistic or whatever).

To recreate a similar chart to what I show for OLS, you could use the GENLIN command to save the lower and upper confidence intervals (to recreate prediction intervals it looks like you will have to calculate your own – but the save commands have all the necessary ingredients). The scatter points just end up being less revealing because all of the points are at two locations on the Y axis (either 0 or 1). People often use jittering and/or rug plots in those circumstances.

If you have any other questions feel free. I figure a post on the type of chart viz. logistic regression would be a good topic and is somewhere in the to-do list, although who knows when I will have time to get to it.

2. #### Clinton Brawner

/  May 10, 2014

Dear Andrew,

Thanks for sharing this syntax. Do you know if it could be modified to work with Cox regression?

Best wishes.
-Clinton

• #### apwheele

/  May 10, 2014

Yes it can be applied to any type of regression model. All the code does is make a set of basis functions, so it turns the original variable into several new variables. These new variables (plus the original variable) should be included on the right hand side for whatever regression equation you are using.

3. #### Andreas Viberg

/  December 7, 2018

Dear Andrew!
I have had great use of your RCS-MACRO in adjusting for age in my regression model, thank you!
I wonder if I can use it to estimate odds ratios for the contrast 25 and 75 percentile for the age (were the RCS-MACRO was used)? In my current project I use a Cox regression model.

Best wishes! /Andreas

• #### apwheele

/  December 10, 2018

You could do the predicted OR at those two points and then conduct a test of the difference — I don’t see what benefits that has over drawing the plot of the OR (or predicted outcome I think is better) over the entire age range (which allows you to do the test by eye basically).

4. #### LThomas

/  February 1, 2019

I am a novice in statistical analysis. I am a clinician by training, getting into clinical outcomes research and have basic background in statistical analysis using SPSS but with minimal experience with macros and syntax. Is it possible for you to post a syntax example so show how to use this macro. I have a continuous predictor variable that I want to use this macro to transform into spline variables for a competing risk analysis and am hopelessly lost.

• #### apwheele

/  February 1, 2019

That is what the blog post does! It is just:

FILE HANDLE macroLoc /name = “??!!File Handle to Syntax location!!??”.
INSERT FILE = “macroLoc\MACRO_RCS.sps”.
!rcs x = YourVariable n = 4.

OR something like:

!rcs x = YourVariable Loc = [4 10 15].

5. #### PStratton

/  November 24, 2019

Thanks so much for your macro. I am a clinician and have been trying very hard to modify your code to accept Cox regression, without success.
I am trying to display the HR of death on the Y axis vs. Age on the X-axis (to display how they interact).
The goal is for a plot similar to https://stackoverflow.com/questions/28385509/how-to-plot-a-cox-hazard-model-with-splines https://www.researchgate.net/figure/Fig-A1-Multivariate-Cox-proportional-hazards-regression-analysis-of-mortality-risk-with_fig2_321833341.
However, COXREG is not like your typical regression. The dependent variable of HR in this case requires time-to-event data (t_death) and event status (death = 1 or 0), for all or a certain portion of your data.
The R code for rcspline is given here https://rdrr.io/cran/Hmisc/src/R/rcspline.plot.s .
If there’s any way you’d be able to perhaps modify your code or provide an example with Cox Regression I’d be extremely indebted to you. Thank you.

• #### apwheele

/  November 25, 2019

It will take alittle bit before I have time to do a full example, but one way is to save the outfile of the coefficient estimates (or a parameter file to score new variables), and then create the values to plot in a second dataset.

That just gets you the non-linear estimate of the hazard ratio though, you may do survival curves at different levels of the non-linear variable as well.

6. #### Clary Foote

/  February 2, 2020

As strange as this sounds, these macros are really giving me problems. I find myself bailing out TO SAS and just coding there. Because implementing the SPSS Macros I have zero clue to use once they involve doing anything. I’m very unclear how to use this macros or how to specify the variable to enter or the number of splines. I feel like I need a video.

• #### apwheele

/  February 2, 2020

Step 2, use the INSERT command to import the macro definition. E.g.

INSERT FILE = “????????\MACRO_RCS.sps”

but you need to replace the question marks with the path for where ever you saved the sps file on your machine.

Step 3, call the macro on your variable. So if your variable is named say “Age”, the command would be:

!rcs x = Age n = 4.

This will then return two new variables, splinex1 and splinex2. This command by default uses Frank Harrell’s rules for different percentiles to define the knot locations. If instead you wanted to say set the knots to particular locations, then you would do it like this:

!rcs x = Age loc = [20 40 60 80].

And again that will give you two new variables splinex1 and splinex2.

That is about as simple as I can make it!

• #### Clary Foote

/  February 2, 2020

Thanks for your expedient reponse. So regarding ‘Step 3, call the macro on your variable. So if your variable is named say “Age”, the command would be:
!rcs x = Age n = 4.’

How do I call it? Where do I go to enter this code? I can google it I’m sure but since I’m asking. Also I’m running SPSS 26. Thanks again. CJ

7. Reply