The default hypothesis tests that software spits out when you run a regression model is the null that the coefficient equals zero. Frequently there are other more interesting tests though, and this is one I’ve come across often — testing whether two coefficients are equal to one another. The big point to remember is that `Var(A-B) = Var(A) + Var(B) - 2*Cov(A,B)`

. This formula gets you pretty far in statistics (and is one of the few I have memorized).

Note that this is not the same as testing whether one coefficient is statistically significant and the other is not. See this Andrew Gelman and Hal Stern article that makes this point. (The link is to a pre-print PDF, but the article was published in the American Statistician.) I will outline four different examples I see people make this particular mistake.

One is when people have different models, and they compare coefficients across them. For an example, say you have a base model predicting crime at the city level as a function of poverty, and then in a second model you include other control covariates on the right hand side. Let’s say the the first effect estimate of poverty is `3 (1)`

, where the value in parentheses is the standard error, and the second estimate is `2 (2)`

. The first effect is statistically significant, but the second is not. Do you conclude that the effect sizes are different between models though? The evidence for that is much less clear.

To construct the estimate of how much the effect declined, the decline would be `3 - 2 = 1`

, a decrease in `1`

. What is the standard error around that decrease though? We can use the formula for the variance of the differences that I noted before to construct it. So the standard error squared is the variance around the parameter estimate, so we have `sqrt(1^2 + 2^2) =~ 2.23`

is the standard error of the difference — which assumes the covariance between the estimates is zero. So the standard error around our estimated decline is quite large, and we can’t be sure that it is an appreciably different estimate of poverty between the two models.

There are more complicated ways to measure moderation, but this ad-hoc approach can be easily applied as you read other peoples work. The assumption of zero covariance for *parameter* estimates is not a big of deal as it may seem. In large samples these tend to be very small, and they are frequently *negative*. So even though we know that assumption is wrong, just pretending it is zero is not a terrible folly.

The second is where you have models predicting different outcomes. So going with our same example, say you have a model predicting property crime and a model predicting violent crime. Again, I will often see people make an equivalent mistake to the moderator scenario, and say that the effect of poverty is larger for property than violent because one is statistically significant and the other is not.

In this case if you have the original data, you actually *can* estimate the covariance between those two coefficients. The simplest way is to estimate that covariance via seemingly unrelated regression. If you don’t though, such as when you are reading someone else’s paper, you can just assume the covariance is zero. Because the parameter estimates often have negative correlations, this assumption will make the standard error estimate *smaller*.

The third is where you have different subgroups in the data, and you examine the differences in coefficients. Say you had recidivism data for males and females, and you estimated an equation of the effect of a treatment on males and another model for females. So we have two models:

```
Model Males : Prob(Recidivism) = B_0m + B_1m*Treatment
Model Females: Prob(Recidivism) = B_0f + B_1f*Treatment
```

Where the `B_0?`

terms are the intercept, and the `B_1?`

terms are the treatment effects. Here is another example where you can stack the data and estimate an interaction term to estimate the difference in the effects and its standard error. So we can estimate a combined model for both males and females as:

`Combined Model: Prob(Recidivism) = B_0c + B_1c*Treatment + B_2c*Female + B_3c(Female*Treatment)`

Where `Female`

is a dummy variable equal to 1 for female observations, and `Female*Treatment`

is the interaction term for the treatment variable and the Female dummy variable. Note that you can rewrite the model for males and females as:

```
Model Mal.: Prob(Recidivism) = B_0c + B_1c *Treatment ....(when Female=0)
Model Fem.: Prob(Recidivism) = (B_0c + B_2c) + (B_1c + B_3c)*Treatment ....(when Female=1)
```

So we can interpret the interaction term, `B_3c`

as the different effect on females relative to males. The standard error of this interaction takes into account the covariance term, unlike estimating two totally separate equations would. (You can stack the property and violent crime outcomes I mentioned earlier in a synonymous way to the subgroup example.)

The final fourth example is the simplest; two regression coefficients in the same equation. One example is from my dissertation, the correlates of crime at small spatial units of analysis. I test whether different places that sell alcohol — such as liquor stores, bars, and gas stations — have the same effect on crime. For simplicity I will just test two effects, whether liquor stores have the same effect as on-premise alcohol outlets (this includes bars and restaurants). So lets say I estimate a Poisson regression equation as:

`log(E[Crime]) = Intercept + b1*Bars + b2*LiquorStores`

And then my software spits out:

```
B SE
Liquor Stores 0.36 0.10
Bars 0.24 0.05
```

And then lets say we also have the variance-covariance matrix of the parameter estimates – which most stat software will return for you if you ask it:

```
L B
Liquor_Stores 0.01
Bars -0.0002 0.0025
```

On the diagonal are the variances of the parameter estimates, which if you take the square root are equal to the reported standard errors in the first table. So the difference estimate is `0.36 - 0.24 = 0.12`

, and the standard error of that difference is `sqrt(0.01 + 0.0025 - 2*-0.002) =~ 0.13`

. So the difference is not statistically significant. You can take the ratio of the difference and its standard error, here `0.12/0.13`

, and treat that as a test statistic from a normal distribution. So the rule that it needs to be plus or minus two to be stat. significant at the 0.05 level applies.

This is called a Wald test specifically. I will follow up with another blog post and some code examples on how to do these tests in SPSS and Stata. For completeness and just because, I also list two more ways to accomplish this test for the last example.

There are two alternative ways to do this test though. One is by doing a likelihood ratio test.

So we have the full model as:

```
log(E[Crime]) = b0 + b1*Bars + b2*Liquor_Stores [Model 1]
```

And we have the reduced model as:

```
log(E[Crime]) = b4 + b5*(Bars + Liquor_Stores) [Model 2]
```

So we just estimate the full model with Bars and Liquor Stores on the right hand side (Model 1), then estimate the reduced model (2) with the sum of `Bars + Liquor Stores`

on the right hand side. Then you can just do a chi-square test based on the change in the log-likelihood. In this case there is a change of one degree of freedom.

I give an example of doing this in R on crossvalidated. This test is nice because it extends to testing multiple coefficients, so if I wanted to test `bars=liquor stores=convenience stores`

. The prior individual Wald tests are not as convenient for testing more than two coefficients equality at once.

Here is another way though to have the computer more easily spit out the Wald test for the difference between two coefficients in the same equation. So if we have the model (lack of intercept does not matter for discussion here):

`y = b1*X + b2*Z [eq. 1]`

We can test the null that b1 = b2 by rewriting our linear model as:

`y = B1*(X + Z) + B2*(X - Z) [eq. 2]`

And the test for the B2 coefficient is our test of interest The logic goes like this — we can expand `[eq. 2]`

to be:

`y = B1*X + B1*Z + B2*X - B2*Z [eq. 3]`

which you can then regroup as:

`y = X*(B1 + B2) + Z*(B1 - B2) [eq. 4]`

and note the equalities between equations 4 and 1.

`B1 + B2 = b1; B1 - B2 = b2`

So B2 tests for the difference between the combined B1 coefficient. B2 is a little tricky to interpret in terms of effect size for how much larger b1 is than b2 – it is only half of the effect. An easier way to estimate that effect size though is to insert (X-Z)/2 into the right hand side, and the confidence interval for that will be the effect estimate for how much larger the effect of X is than Z.

Note that this gives an equivalent estimate as to conducting the Wald test by hand as I mentioned before.

## Thom

/ May 26, 2017This is a really clear summary. I’d also add that the reparameterization to b1 * (x1+x2)/2 and b2 * (x1-x2) is also sometimes useful for handling collinearity when you have two highly correlated predictors that are also capturing some nuanced distinction.

## Kate

/ August 8, 2019Hi Andrew, thanks so much for the explanation. I currently encounter a similar question: to test the equality of two regression coefficients from two different models but in the same sample. I need to test whether the cross-sectional effects of an independent variable are the same at two time points. Since the effects/regression coefficients may be correlated at the two time points, and I don’t know how to calculate their covariance, could you advise what to do?

## apwheele

/ August 8, 2019From your description you can likely stack the models and construct an interaction effect. So something like

y_it = B0 + B1*(X) + B2*(Time Period = 2) + B3(X*Time Period = 2)

Then the B3 effect is the difference in the X effect across the two time periods.

(A complication of this is you should account for correlated errors across the shared units in the two groups. Such as via clustered standard errors or random/fixed effects for units.)

If X does not change over the two time periods, you could do the SUR approach and treat the two time periods as different dependent variables, see https://andrewpwheeler.wordpress.com/2017/06/12/testing-the-equality-of-coefficients-same-independent-different-dependent-variables/.

(Which is another way to account for the correlated errors across the models.)

## liuchaoyu@gmail.com

/ August 8, 2019Thanks Andrew. Can the model also applies to when the DV are measured at two different time but the IV are the same across time? say can I use it to compare the prediction effects of parent educational level on children’s grades at year 1 and the prediction on year 2 grades. since the year 1 grade will definitely be correlated with year 2. Thanks again!

## apwheele

/ August 9, 2019Just based on that description I would use a multi-level growth type model, with a random intercept for students. Then you just have the covariates as I stated.