This headline popped up in my newsfeed this morning:

Boys will be boys: Data error prompts U-turn on study of sex differences in school (Retraction Watch, 2017)

The article is about a peer-reviewed article on self-regulation of study habits that was published earlier this year. In the retraction, the authors noted they had discovered a “coding error” that flipped the outcome of their research. Although the standard procedure for using coding dummy variables is “1” = Yes and “0” = No, that procedure trips up a bit when it comes to gender. Generally, male is coded “1” but here the researcher doing the basic coding used “1” to mean female. No one noticed and they drew their conclusions as if the data had been coded the opposite way with “1” meaning male.

I noticed several students made a similar “error” when setting up the multiple regression in M3A2 on the value of a fireplace. And that was a “coding error” when using a dummy variable to include the categorical variable “fireplace” in the regression.

As you probably know, a regression requires quantitative variables but, in our database, we have a variable Fireplace that contained either a True or False text value. When we use a dummy variable, we replace a categorical text value with a number. In the Fireplace problem, a logical way to do that, logical for me, would be to use a “1” to indicate the presence of a fireplace and a “0” to indicate no fireplace. Some students chose to do the reverse, “0” for Fireplace = True and “1” for Fireplace = False.

The error I am speaking of is not that – deciding to use “0” for Fireplace = True. The error comes in not understanding what either choice means for how you interpret the outcome of the regression.

If you chose “0” for Fireplace = True and did the multiple regression correctly, you came out with a beta2 coefficient of about -$5567 for the dummy variable.

If you did the opposite and used “1” for Fireplace = True, you found a beta2 coefficient of +$5567.

The error happens when you try to interpret these outcomes.

It is straightforward to interpret the outcome if you let “1” indicate the presence of a fireplace, Image 1. That is that having a fireplace adds about $5567 in value to the sales price of a typical house. You see this when you use the CI/PI worksheet to forecast home values by putting in either a “0” or a “1” in the calculator.

Image 1

But some students who used “0” to indicate the presence of a fireplace came up with an incorrect conclusion: that the presence of a fireplace reduced the price of a home because beta2 was negative.

In reality, for the students who used “0” to indicate the presence of a fireplace, the negative beta2 tells you that not having a fireplace reduces the value of a home, just the opposite.

Image 2

Final thought: you may notice that the y-intercepts on the two coding methods are also different. But if you check, you will see that the y-intercept for Fireplace=True=0 is $5567 greater than the y-intercept of Fireplace=True=1. That makes sense because under that scenario, the starting point should be greater because the assumption is that the house has a fireplace.

Retraction Watch. (2017, Oct). Boys will be boys: Data error prompts U-turn on study of sex differences in school. Retrieved from Retraction Watch: