In my ask me anything series, Rob Case writes in a question about interpreting one-sided tests for the difference in coefficients:
Mr. Wheeler,
Thank you for your page https://andrewpwheeler.com/2016/10/19/testing-the-equality-of-two-regression-coefficients/
I did your technique (at the end of the page) of re-running the model with X+Z and X-Z as independent variables (with coefficients B1 and B2, respectively).
I understand:
- (although you did not say so) that testing whether coefficient b1 (X’s coefficient in the original equation) is LESS THAN coefficient b2 (Z’s coefficient in the original regression) is a one-sided test; and testing whether one coefficient is DIFFERENT from another is a two-sided test
- that the 90%-confidence t-distribution-critical-values-with-infinite-degrees-of-freedom are 1.282 for one-sided tests and 1.645 for two-sided tests
- that if the resulting t-stat for the B2 coefficient is say 1.5, then—according to the tests—I should therefore be 90% confident that b1 is in fact less than b2; and I should NOT be 90% confident that b1 is different from b2.
But—according to MY understanding of logic and statistics—if I am 90% confident that b1 is LESS THAN b2, then I would be MORE THAN 90% confident that b1 DIFFERS from b2 (because “differs” includes the additional chance that b1 is greater than b2), i.e. the tests and my logic conflict. What am I doing wrong?
Rob
So I realize null hypothesis statistical testing (NHST) can be tricky to interpret – but the statement in 3 is not consistent with how we do NHST for several reasons.
So if we have a null hypothesis that Beta1 = Beta2, for reasons to do with the central limit theorem we actually rewrite this to be:
Null: Beta1 - Beta2 = 0 => Theta0
I’ve noted this new parameter we are testing – the difference in the two coefficients – as Theta0. For NHST we assume this parameter is 0, and then test to see how close our data is to this parameter. So we estimate with our data:
b1 - b2 = Diff
DiffZ = Diff/StandardError_Diff
Now, to calculate a p-value, we need to say how unlikely our data estimate, DiffZ, is given the assumed null distribution Theta0. So imagine we draw our standard normal distribution curve about Theta0. This then defines the space for NHST, for a typical two sided test we have (here assuming DiffZ is a negative value):
P(Z < DiffZ | Theta0 ) + P(Z > -DiffZ | Theta0 ) = Two tailed p-value
Where less than Z is our partitioning of the space of the null hypothesis, since an exact value for DiffZ here when the distribution of potential outcomes is continuous is zero. For a one sided test, you would just take the relevant portion of the above, and not add the two above portions together:
P(Z < DiffZ | Theta0 ) = One tail p-value for Beta1 < Beta2
P(Z > -DiffZ | Theta0 ) = One tail p-value for Beta1 > Beta2
Note here that the test is conditional on the null hypothesis. Statements such as ‘I should therefore be 90% confident that b1 is in fact less than b2’, which seem to estimate the complement of the p-value (e.g. 1 – p-value) and interpret it as a meaningful probability are incorrect.
P-values are basically numerical summaries of how close the data are to the presumed null distribution. Small p-values just indicate they are not close to the assumed null distribution. The complement of the p-value is not evidence for the alternative hypothesis. It is just the left over distribution for the null hypothesis that is inside the Z values.
Statisticians oftentimes at this point in the conversation suggest Bayesian analysis and instead interpret posteriori probabilities instead of p-values. I will stop here though, as I am not sure “90% confident” readily translates into a specific Bayesian statement. (It could be people are better off doing inferiority/equivalence testing for example, e.g. changing the null hypothesis.)