Restricted cubic splines in SPSS

I’ve made a macro to estimate restricted cubic spline (RCS) basis in SPSS. Splines are useful tools to model non-linear relationships. Splines are useful exploratory tools to model non-linear relationships by transforming the independent variables in multiple regression equations. See Durrleman and Simon (1989) for a simple intro. I’ve largely based my implementation around the various advice Frank Harell has floating around the internet (see the rcspline function in his HMisc R package), although I haven’t read his book (yet!!).

So here is the SPSS MACRO, and below is an example of its implementation. It takes either an arbitrary number of knots, and places them at the default locations according to quantiles of x’s. Or you can specify the exact locations of the knots. RCS need at least three knots, because they are restricted to be linear in the tails, and so will return k – 2 bases (where k is the number of knots). Below is an example of utilizing the default knot locations, and a subsequent plot of the 95% prediction intervals and predicted values superimposed on a scatterplot.


FILE HANDLE macroLoc /name = "D:\Temp\Restricted_Cubic_Splines".
INSERT FILE = "macroLoc\MACRO_RCS.sps".

*Example of there use - data example taken from http://www-01.ibm.com/support/docview.wss?uid=swg21476694.
dataset close ALL.
output close ALL.
SET SEED = 2000000.
INPUT PROGRAM.
LOOP xa = 1 TO 35.
LOOP rep = 1 TO 3.
LEAVE xa.
END case.
END LOOP.
END LOOP.
END file.
END INPUT PROGRAM.
EXECUTE.
* EXAMPLE 1.
COMPUTE y1=3 + 3*xa + normal(2).
IF (xa gt 15) y1=y1 - 4*(xa-15).
IF (xa gt 25) y1=y1 + 2*(xa-25).
GRAPH
/SCATTERPLOT(BIVAR)=xa WITH y1.

*Make spline basis.
*set mprint on.
!rcs x = xa n = 4.
*Estimate regression equation.
REGRESSION
  /MISSING LISTWISE
  /STATISTICS COEFF OUTS R ANOVA
  /CRITERIA=PIN(.05) POUT(.10) CIN(95)
  /NOORIGIN
  /DEPENDENT y1
  /METHOD=ENTER xa  /METHOD=ENTER splinex1 splinex2
  /SAVE PRED ICIN .
formats y1 xa PRE_1 LICI_1 UICI_1 (F2.0).
*Now I can plot the observed, predicted, and the intervals.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=xa y1 PRE_1 LICI_1 UICI_1
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
 SOURCE: s=userSource(id("graphdataset"))
 DATA: xa=col(source(s), name("xa"))
 DATA: y1=col(source(s), name("y1"))
 DATA: PRE_1=col(source(s), name("PRE_1"))
 DATA: LICI_1=col(source(s), name("LICI_1"))
 DATA: UICI_1=col(source(s), name("UICI_1"))
 GUIDE: axis(dim(1), label("xa"))
 GUIDE: axis(dim(2), label("y1"))
 ELEMENT: area.difference(position(region.spread.range(xa*(LICI_1+UICI_1))), color.interior(color.lightgrey), transparency.interior(transparency."0.5"))
 ELEMENT: point(position(xa*y1))
 ELEMENT: line(position(xa*PRE_1), color(color.red))
END GPL.

See the macro for an example of specifying the knot locations. I also placed functionality to estimate the basis by groups (for the default quantiles). My motivation was partly to replicate the nice functionality of ggplot2 to make smoothed regression estimates by groups. I don’t know off-hand though if having different knot locations between groups is a good idea, so caveat emptor and all that jazz.

I presume this is still needed functionality in SPSS, but if this was not needed let me know in the comments. Other examples are floating around (see this technote and this Levesque example), but this is the first I’ve seen of implementing the restricted cubic splines.

Leave a comment

25 Comments

  1. Dear Andrew,

    Thanks for the macro.

    I’m curretly evaluating the usefulness of splines on one of my PhD projects and using a STATA syntax for that. Since I’m much more used to SPSS this really comes in handy.

    I do have a question though. Since the graph takes advantage of the “ICIN” option for the regression, I cannot easily (to my knowledge) reproduce the same kind of graph for a logistic regression. With this, my question: is there a way to adapt the macro to run with logistic regressions?

    Best Regards,
    Thomas

    Reply
    • Hi Thomas,

      The macro doesn’t have anything to do directly with making graphs. The macro just returns a set of separate variables that are the spline basis, and can be used as independent variables in any regression (be it linear or logistic or whatever).

      To recreate a similar chart to what I show for OLS, you could use the GENLIN command to save the lower and upper confidence intervals (to recreate prediction intervals it looks like you will have to calculate your own – but the save commands have all the necessary ingredients). The scatter points just end up being less revealing because all of the points are at two locations on the Y axis (either 0 or 1). People often use jittering and/or rug plots in those circumstances.

      If you have any other questions feel free. I figure a post on the type of chart viz. logistic regression would be a good topic and is somewhere in the to-do list, although who knows when I will have time to get to it.

      Reply
  2. Clinton Brawner

     /  May 10, 2014

    Dear Andrew,

    Thanks for sharing this syntax. Do you know if it could be modified to work with Cox regression?

    Best wishes.
    -Clinton

    Reply
    • Yes it can be applied to any type of regression model. All the code does is make a set of basis functions, so it turns the original variable into several new variables. These new variables (plus the original variable) should be included on the right hand side for whatever regression equation you are using.

      Reply
  3. Dear Andrew!
    I have had great use of your RCS-MACRO in adjusting for age in my regression model, thank you!
    I wonder if I can use it to estimate odds ratios for the contrast 25 and 75 percentile for the age (were the RCS-MACRO was used)? In my current project I use a Cox regression model.

    Best wishes! /Andreas

    Reply
    • You could do the predicted OR at those two points and then conduct a test of the difference — I don’t see what benefits that has over drawing the plot of the OR (or predicted outcome I think is better) over the entire age range (which allows you to do the test by eye basically).

      Reply
  4. LThomas

     /  February 1, 2019

    I am a novice in statistical analysis. I am a clinician by training, getting into clinical outcomes research and have basic background in statistical analysis using SPSS but with minimal experience with macros and syntax. Is it possible for you to post a syntax example so show how to use this macro. I have a continuous predictor variable that I want to use this macro to transform into spline variables for a competing risk analysis and am hopelessly lost.

    Reply
    • That is what the blog post does! It is just:

      FILE HANDLE macroLoc /name = “??!!File Handle to Syntax location!!??”.
      INSERT FILE = “macroLoc\MACRO_RCS.sps”.
      !rcs x = YourVariable n = 4.

      OR something like:

      !rcs x = YourVariable Loc = [4 10 15].

      Reply
  5. PStratton

     /  November 24, 2019

    Thanks so much for your macro. I am a clinician and have been trying very hard to modify your code to accept Cox regression, without success.
    I am trying to display the HR of death on the Y axis vs. Age on the X-axis (to display how they interact).
    The goal is for a plot similar to https://stackoverflow.com/questions/28385509/how-to-plot-a-cox-hazard-model-with-splines https://www.researchgate.net/figure/Fig-A1-Multivariate-Cox-proportional-hazards-regression-analysis-of-mortality-risk-with_fig2_321833341.
    However, COXREG is not like your typical regression. The dependent variable of HR in this case requires time-to-event data (t_death) and event status (death = 1 or 0), for all or a certain portion of your data.
    The R code for rcspline is given here https://rdrr.io/cran/Hmisc/src/R/rcspline.plot.s .
    If there’s any way you’d be able to perhaps modify your code or provide an example with Cox Regression I’d be extremely indebted to you. Thank you.

    Reply
    • It will take alittle bit before I have time to do a full example, but one way is to save the outfile of the coefficient estimates (or a parameter file to score new variables), and then create the values to plot in a second dataset.

      That just gets you the non-linear estimate of the hazard ratio though, you may do survival curves at different levels of the non-linear variable as well.

      Reply
  6. As strange as this sounds, these macros are really giving me problems. I find myself bailing out TO SAS and just coding there. Because implementing the SPSS Macros I have zero clue to use once they involve doing anything. I’m very unclear how to use this macros or how to specify the variable to enter or the number of splines. I feel like I need a video.

    Reply
    • Step 1, download the sps file that contains the macro definition to your work computer, https://dl.dropboxusercontent.com/s/xndrv8vgx12v6gd/MACRO_RCS.sps?dl=0.

      Step 2, use the INSERT command to import the macro definition. E.g.

      INSERT FILE = “????????\MACRO_RCS.sps”

      but you need to replace the question marks with the path for where ever you saved the sps file on your machine.

      Step 3, call the macro on your variable. So if your variable is named say “Age”, the command would be:

      !rcs x = Age n = 4.

      This will then return two new variables, splinex1 and splinex2. This command by default uses Frank Harrell’s rules for different percentiles to define the knot locations. If instead you wanted to say set the knots to particular locations, then you would do it like this:

      !rcs x = Age loc = [20 40 60 80].

      And again that will give you two new variables splinex1 and splinex2.

      That is about as simple as I can make it!

      Reply
      • Thanks for your expedient reponse. So regarding ‘Step 3, call the macro on your variable. So if your variable is named say “Age”, the command would be:
        !rcs x = Age n = 4.’

        How do I call it? Where do I go to enter this code? I can google it I’m sure but since I’m asking. Also I’m running SPSS 26. Thanks again. CJ

    • Norihisa Tamura

       /  March 16, 2021

      Thanks for sharing this syntax. I would like to run the analysis with this syntax, but unfortunately I can’t see the screenshot. Can you please upload the image again in another way?
      Best wishes.

      Reply
      • N Tamura

         /  March 16, 2021

        I have one more request.
        I entered the syntax “! rcs x = my variables n = 4.” But it is not recognized as a command. What should I do?

      • I sometimes can’t control the images that get rendered, feel free to send me an email. This one just shows a syntax window with the commands typed in:

        **********************************.
        FILE HANDLE macroLoc /NAME = “D:\Temp\Restricted_Cubic_Splines”.
        INSERT FILE = “macroLoc\MACRO_RCS.sps”.
        !rcs x = Age n = 4.
        **********************************.

        For your second comment, maybe the macro has not been defined yet? Or maybe the space is causing problems.

  7. Clary J Foote

     /  December 22, 2020

    Andrew,
    When I use the macro I can run the model in GLM no problem with a single variable and its accompanying spline variable. However, if I load the model with other variables, it causes the graph to get very erratic. Can you advise how to do a multivariable model with more than a single variable that the spline variable is corresponding to? Thanks

    Reply
    • To make the graph, you need to set all of the other variables to some constant value and then plot the predicted fitted curve. (It doesn’t make sense to do that particular graph I show with the observed data points when you are using other variables, but it is fine to make a model with multiple independent variables.)

      So you can do that using either the missing data trick, or saving the model file and scoring new data. (So if you need to do that, you should probably explicitly set the spline locations.)

      Reply
  8. So I set the spline locations as I have clinical data that would predict where I expect the “bending” to occur. I just want the predicted values to be adjusted for confounding. As you said you can have multiple variables. However, when I place the linear variable with the spline variable (with specific cutpoints set) into the model and ask it to run predicted probabilities, it is erratic. So I’m wondering I’m not clear on how to generate a spline graph adjusting for the confounders while including the “splined variable” and the the actual spline variable and then running the predicted probabilities along the “splined variable.” if that makes sense. Specifically, I’m looking at time to surgery, and I want to see how that predicts infection rates for certain open/compound fractures. However, there are MANY confounders. So if I can incorporate them into the spline graph, that’s what I’d like to do. Sounds like you have to set the confounders to a mean value to do this? CJ

    Reply
  9. Clary J Foote

     /  December 22, 2020

    What I’ve been doing is just restricting the dataset for certain big confounders values and running the predicted probability with time to surgery and the spline variable. Which generates a smooth beautiful curve. But it doesn’t allow me to really build a model and graph it, which is what I’d like to do.

    Reply
  1. Jittered scatterplots with 0-1 data | Andrew Wheeler
  2. Graphing Spline Predictions in SPSS | Andrew Wheeler

Leave a Reply to Clary J Foote Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: