Comparing samples post-matching – some helper functions after FUZZY (SPSS)

I’ve been conducting quite a few case-control or propensity score matching studies lately. So I wrote some helper functions for use after the SPSS FUZZY command. These create the case-control dataset, plus calculate some of the standardized bias metrics for matching on continuous outcomes.

The use case here is if you have a sub-set of treated individuals, and you want to draw a comparison sample matched on certain characteristics (which can include just one propensity score and/or multiple covariates). Here is the macro to follow along, and I will provide a quick walkthrough of how it works. (There is documentation in the header for what the parameters are and what the function returns.)

So first I am going to import my macro using INSERT:

*Inserting the macro.
INSERT FILE = "C:\Users\andrew.wheeler\Dropbox\Documents\BLOG\Matching_StandBias\PropBalance_Macro.sps".

Now just for illustration I am going to make a fake dataset to illustrate the utility of matching. Here I have a universe of 2,000 people. There is a subset of treated individuals (165), but they are only selected if they are under 28 years old and male.

*Create a fake dataset.
LOOP Id = 1 TO 2000.
COMPUTE YearsOld = RV.UNIFORM(18,40).
FORMATS Male (F1.0) YearsOld (F2.0).
DO IF Male = 1 AND YearsOld <= 28.
  COMPUTE Treated = RV.BERNOULLI(0.3).
  COMPUTE Treated = 0.
COMPUTE #OutLogit = 0.7 + 0.5*Male - 0.05*YearsOld - 0.7*Treated.
COMPUTE #OutProb = 1/(1 + EXP(-#OutLogit)).
FREQ Treated Outcome.

So what happens when we make comparisons among the entire sample, which includes females and older people?

*Compare means with the original full sample.
T-TEST GROUPS=Treated(0 1) /VARIABLES=Outcome.

We get basically no difference, our treated mean is 0.40 and the untreated mean is 0.39. But instead of comparing the 165 to the entire sample, we draw more reasonable control cases. Here we do an exact match on Male, and then we do a fuzzy match on YearsOld to within 3 years.

*Draw the comparison sample based on Male (exact) and YearsOld (Fuzzy).

Now what the FUZZY command does in SPSS is creates a new variable, named Match1 here, that places the matched Id in the same row as the original treated sample. You cannot easily make the updated comparisons that you want though in this data format. So after writing the code to do this about 7 times, I decided to make it into a simple macro. Here is an example of calling my macro, !MatchedSample.

*Now run my macro to make the matched sample.
!MatchedSample Dataset=OrigData Id=Id Case=Treated MatchGroup=MGroup Controls=[Match1] 
  MatchVars=[YearsOld] OthVars=Outcome Male.

This then spits out two new datasets, as well as appends a new variable to the original dataset named MatchedSample to show what cases have been matched. Then it is simple to see the difference in our means among our matched sample.

*Now the t-test with the matched sample subset.
T-TEST GROUPS=Treated(0 1) /VARIABLES=Outcome.

Which shows the same mean for treated, 0.40 (since all the treated were matched), but the comparison group now has a mean of 0.51, so here the treatment reduced the outcome.

The macro also provides an additional dataset named AggStats that estimates the standardized bias in the original sample vs. the standardized bias in the matched sample. (Standardized bias is just Cohen’s D measure multiplied by 100.) This then also calculates the standardized bias reduction for each continuous covariate. Before I forget, a neat way to test for balance jointly (instead of one variable at a time) is to conduct an additional regression equation predicting treatment and then testing for all coefficients equal to zero.

In this fake example the propensity scores would not be needed, you could just estimate a typical logistic regression equation controlling for YearsOld and Male. But the utility of matching comes from when you don’t know the functional form of how those covariates affect the outcome. So if the outcome was a very non-linear function of age, you don’t have to worry about estimating that function, you can just match on age and still get a reasonable comparison of the mean difference for treated vs. not-treated.

Next Post
Leave a comment


  1. Joseph Abdulnour

     /  October 18, 2019

    Hello, thanks for the great instructions. I have a question, is it possible to do a case-control matching using FUZZY for a 1:4 ratio? (1 case for 4 controls). If so, how is it done in the syntax.
    I want to match for age gender and diagnosis.


    • You would do that on the FUZZY command. So for the option NEWDEMANDERIDVARS, just supply it with the total number of matches you want. So if you wanted only one match you would do:


      But if you wanted four matches you would do:

      NEWDEMANDERIDVARS=Match1 Match2 Match3 Match4

      • Joseph Abdulnour

         /  October 18, 2019

        Hello, thanks for the instructions. Both control (n=4000) and intervention (n=19) are on the same dataset.

        Here is what I have for syntax:

        FUZZY BY= age gender diag
        NEWDEMANDERIDVARS=Match1 Match2 Match3 Match4 (I added 4 matches)
        GROUP=group FUZZ=1 0 0
        MATCHGROUPVAR=grp DS3=Agegenddiag

        When I run the analysis with the 4 Matches in the NEWDEMANDERIDVARS
        I get a warning: “waiting for a string argument”
        sorry the warning is in french and that’s the translation (Attend un argument de chaîne).

        Did i do some wrong in the syntax, of missed something?

        Thanks again for your time.

      • Sorry not sure about that one — your BY variables are all numeric correct? (I don’t remember if the SUPPLIERID needs to be numeric or string specifically either.)

  2. Joseph Abdulnour

     /  October 18, 2019

    Only age is numeric. gender and diag are string. My SUPLIERID is string.
    you think you can direct me to the someone or some documents that can help me with this issue?
    thanks again.

    • Fuzzy by default expects numbers — so just do AUTORECODE to gender and diag and supply those numbers to the command (since you are doing exact matching on them).

      Only docs for this I know of would be if you do FUZZY /HELP (maybe there is a help you can access in the GUI as well?)

  3. Joseph Abdulnour

     /  October 18, 2019

    Thanks again for your help, I recoded the variables. I even simplified the model by having only diag (all numeric), and only 2 matches in the NEWDEMANDERIDVARS, and i still get the same warning. I also forgot to mention that every time I run the analysis, the new dataset opens up with just one line, but is not matched to anything.
    Anyway, thanks again for taking the time to help me, It’s greatly appreciated.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: