I’ve been conducting quite a few case-control or propensity score matching studies lately. So I wrote some helper functions for use after the SPSS FUZZY
command. These create the case-control dataset, plus calculate some of the standardized bias metrics for matching on continuous outcomes.
The use case here is if you have a sub-set of treated individuals, and you want to draw a comparison sample matched on certain characteristics (which can include just one propensity score and/or multiple covariates). Here is the macro to follow along, and I will provide a quick walkthrough of how it works. (There is documentation in the header for what the parameters are and what the function returns.)
So first I am going to import my macro using INSERT
:
*Inserting the macro.
INSERT FILE = "C:\Users\andrew.wheeler\Dropbox\Documents\BLOG\Matching_StandBias\PropBalance_Macro.sps".
Now just for illustration I am going to make a fake dataset to illustrate the utility of matching. Here I have a universe of 2,000 people. There is a subset of treated individuals (165), but they are only selected if they are under 28 years old and male.
*Create a fake dataset.
SET SEED 10.
INPUT PROGRAM.
LOOP Id = 1 TO 2000.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME OrigData.
COMPUTE Male = RV.BERNOULLI(0.7).
COMPUTE YearsOld = RV.UNIFORM(18,40).
FORMATS Male (F1.0) YearsOld (F2.0).
DO IF Male = 1 AND YearsOld <= 28.
COMPUTE Treated = RV.BERNOULLI(0.3).
ELSE.
COMPUTE Treated = 0.
END IF.
COMPUTE #OutLogit = 0.7 + 0.5*Male - 0.05*YearsOld - 0.7*Treated.
COMPUTE #OutProb = 1/(1 + EXP(-#OutLogit)).
COMPUTE Outcome = RV.BERNOULLI(#OutProb).
FREQ Treated Outcome.
So what happens when we make comparisons among the entire sample, which includes females and older people?
*Compare means with the original full sample.
T-TEST GROUPS=Treated(0 1) /VARIABLES=Outcome.
We get basically no difference, our treated mean is 0.40 and the untreated mean is 0.39. But instead of comparing the 165 to the entire sample, we draw more reasonable control cases. Here we do an exact match on Male
, and then we do a fuzzy match on YearsOld
to within 3 years.
*Draw the comparison sample based on Male (exact) and YearsOld (Fuzzy).
FUZZY BY=Male YearsOld SUPPLIERID=Id NEWDEMANDERIDVARS=Match1 GROUP=Treated
EXACTPRIORITY=FALSE FUZZ=0 3 MATCHGROUPVAR=MGroup DRAWPOOLSIZE=CheckSize
/OPTIONS SAMPLEWITHREPLACEMENT=FALSE MINIMIZEMEMORY=TRUE SHUFFLE=TRUE SEED=10.
Now what the FUZZY
command does in SPSS is creates a new variable, named Match1
here, that places the matched Id in the same row as the original treated sample. You cannot easily make the updated comparisons that you want though in this data format. So after writing the code to do this about 7 times, I decided to make it into a simple macro. Here is an example of calling my macro, !MatchedSample
.
*Now run my macro to make the matched sample.
!MatchedSample Dataset=OrigData Id=Id Case=Treated MatchGroup=MGroup Controls=[Match1]
MatchVars=[YearsOld] OthVars=Outcome Male.
This then spits out two new datasets, as well as appends a new variable to the original dataset named MatchedSample
to show what cases have been matched. Then it is simple to see the difference in our means among our matched sample.
*Now the t-test with the matched sample subset.
DATASET ACTIVATE MatchedSamples.
T-TEST GROUPS=Treated(0 1) /VARIABLES=Outcome.
Which shows the same mean for treated, 0.40 (since all the treated were matched), but the comparison group now has a mean of 0.51, so here the treatment reduced the outcome.
The macro also provides an additional dataset named AggStats
that estimates the standardized bias in the original sample vs. the standardized bias in the matched sample. (Standardized bias is just Cohen’s D measure multiplied by 100.) This then also calculates the standardized bias reduction for each continuous covariate. Before I forget, a neat way to test for balance jointly (instead of one variable at a time) is to conduct an additional regression equation predicting treatment and then testing for all coefficients equal to zero.
In this fake example the propensity scores would not be needed, you could just estimate a typical logistic regression equation controlling for YearsOld
and Male
. But the utility of matching comes from when you don’t know the functional form of how those covariates affect the outcome. So if the outcome was a very non-linear function of age, you don’t have to worry about estimating that function, you can just match on age and still get a reasonable comparison of the mean difference for treated vs. not-treated.
Joseph Abdulnour
/ October 18, 2019Hello, thanks for the great instructions. I have a question, is it possible to do a case-control matching using FUZZY for a 1:4 ratio? (1 case for 4 controls). If so, how is it done in the syntax.
I want to match for age gender and diagnosis.
thanks
apwheele
/ October 18, 2019You would do that on the FUZZY command. So for the option NEWDEMANDERIDVARS, just supply it with the total number of matches you want. So if you wanted only one match you would do:
NEWDEMANDERIDVARS=Match1
But if you wanted four matches you would do:
NEWDEMANDERIDVARS=Match1 Match2 Match3 Match4
Joseph Abdulnour
/ October 18, 2019Hello, thanks for the instructions. Both control (n=4000) and intervention (n=19) are on the same dataset.
Here is what I have for syntax:
FUZZY BY= age gender diag
SUPPLIERID=ChartNumber
NEWDEMANDERIDVARS=Match1 Match2 Match3 Match4 (I added 4 matches)
GROUP=group FUZZ=1 0 0
EXACTPRIORITY=FALSE
MATCHGROUPVAR=grp DS3=Agegenddiag
/OPTIONS SAMPLEWITHREPLACEMENT=FALSE MINIMIZEMEMORY=TRUE SHUFFLE=FALSE.
When I run the analysis with the 4 Matches in the NEWDEMANDERIDVARS
I get a warning: “waiting for a string argument”
sorry the warning is in french and that’s the translation (Attend un argument de chaîne).
Did i do some wrong in the syntax, of missed something?
Thanks again for your time.
apwheele
/ October 18, 2019Sorry not sure about that one — your BY variables are all numeric correct? (I don’t remember if the SUPPLIERID needs to be numeric or string specifically either.)
Joseph Abdulnour
/ October 18, 2019Only age is numeric. gender and diag are string. My SUPLIERID is string.
you think you can direct me to the someone or some documents that can help me with this issue?
thanks again.
apwheele
/ October 18, 2019Fuzzy by default expects numbers — so just do AUTORECODE to gender and diag and supply those numbers to the command (since you are doing exact matching on them).
Only docs for this I know of would be if you do FUZZY /HELP (maybe there is a help you can access in the GUI as well?)
Joseph Abdulnour
/ October 18, 2019Thanks again for your help, I recoded the variables. I even simplified the model by having only diag (all numeric), and only 2 matches in the NEWDEMANDERIDVARS, and i still get the same warning. I also forgot to mention that every time I run the analysis, the new dataset opens up with just one line, but is not matched to anything.
Anyway, thanks again for taking the time to help me, It’s greatly appreciated.