This question comes up frequently in my intro stats class. We are covering paired (dependent) samples hypothesis tests and the explanation in the textbook gives the students a fuzzy discussion and includes a complex formula for finding the latter.

“d-bar” (my editor is not letting me use the correct symbol, but this is the letter “d” with a bar over it) is just the average of the differences (d) in the two samples.

“s sub d” (editor again) is just the plain old standard deviation of the differences, despite this rather complex formula:

Here is an example of how to find them using basic Excel functions:

Here is the Excel worksheet with formulas:

Unfortunately, StatCrunch does not give you “s sub d” directly in any of the built-in analyses and but you can find them both by the following method:

]]>

The general rule is to not round any intermediate values in a series of calculations and, instead, carry forward as many decimal places as possible. For example, if you need to first calculate the standard deviation for use in a later equation, keep as many decimal places as possible (no less than 6 is my recommendation) in your standard deviation value.

But when you enter your answers into MyStatLab (MSL), round to the required number of decimal places. When MSL says “two decimal places as required” or “one decimal place as needed,” that means **exactly** that many of decimal places are required.

In this example from 7.3, MSL asks for the “nearest thousandth as needed.” That means you must enter exactly three numbers after the decimal.

If you enter 2.85, you will be counted wrong. If you enter 2.8449, you will be counted wrong. For example, see images below:

Generally, MSL does allow a tolerance level around the exact answer, e.g. 4.2004 + or – 0.0001. And on some problems, MSL also includes alternate answers where “technology” gives a slightly different value than interpolating from a table, again within a tolerance range.

But when it comes to the number of decimal places, MSL is strict.

This may seem a bit picky, but in the real world, our bosses will not be happy if we do not follow instructions.

]]>Later in the course, things will get more complicated re degrees of freedom.

Let me try to give a short explanation of degrees of freedom for this part of the course.

Suppose we have a sample of five weights of puppies. To the nearest pound – we have a cruddy scale – they are 3, 4, 4, 5, ?. If I tell you the total of the five weights is 22 pounds, and then tell you the first four weights (3, 4, 4, 5), can you find the 5th weight? You should say “Sure!” Just add up the four weights – 16 pounds – and subtract from the overall total – 22 pounds – to find the missing weight is 6 pounds.

Once you know the four of the five, the fifth weight is not free to vary. That sample of 5 puppy weights has just 4 degrees of freedom, (n-1).

Although the math gets tricky in later parts of the course (after the midterm), the concept of degrees of freedom is similar to this.

]]>[My **Excel** calculator for running this test is found here.]

As always for proportion problems, we have to check first to be sure n*p and n*q or n*(1-p) are both > 5.

If they are, we can use the normal approximation to the binomial (a proportion is essentially a binomial test). That is why f**or all the proportion problems we do in this course we use the z-test**. If either n*p or n*q is not > 5, we have to use a test that is beyond the scope of this course.

Step 1 as almost always is true for us, is to click on **Stat** once StatCrunch is open. The sequence is **Stat>Proportion Stats>One Sample>With Summary**. Then enter the number of success, which is just n*p, the sample size, select the Hypothesis test for p, and enter the null hypothesis and select the math operator for Ha. Click **Compute!** The answer window opens and you can see the test statistic of z= -1.042 and the p-value of 0.29. Although this problem asks us to find the critical value and make our decision based on that, the p-value always agrees and tells us we Fail to Reject the null since p>α.

I used the **Stat>Calculators>Normal** path to bring up the normal calculator. Because this is a two-tailed test (recall Ha has the ≠ math operator), I like to use the **Between** button and enter the confidence level c, which is 1-α or 0.9. The calculator shows the two critical values and rejection areas. Because the test statistic of -1.042 does not fall in either rejection area, we get the same decision of **Fail to Reject the null**.

The final part is to draw a conclusion. Recall the alternative was the claim, so we say: **Fail to Reject Ho. The data do NOT provide sufficient evidence to Support the claim.**

Boys will be boys: Data error prompts U-turn on study of sex differences in school (Retraction Watch, 2017)

The article is about a peer-reviewed article on self-regulation of study habits that was published earlier this year. In the retraction, the authors noted they had discovered a “coding error” that flipped the outcome of their research. Although the standard procedure for using coding dummy variables is “1” = Yes and “0” = No, that procedure trips up a bit when it comes to gender. Generally, male is coded “1” but here the researcher doing the basic coding used “1” to mean female. No one noticed and they drew their conclusions as if the data had been coded the opposite way with “1” meaning male.

I noticed several students made a similar “error” when setting up the multiple regression in M3A2 on the value of a fireplace. And that was a “coding error” when using a dummy variable to include the categorical variable “fireplace” in the regression.

As you probably know, a regression requires quantitative variables but, in our database, we have a variable Fireplace that contained either a True or False text value. When we use a dummy variable, we replace a categorical text value with a number. In the Fireplace problem, a logical way to do that, logical for me, would be to use a “1” to indicate the presence of a fireplace and a “0” to indicate no fireplace. Some students chose to do the reverse, “0” for Fireplace = True and “1” for Fireplace = False.

The error I am speaking of is not that – deciding to use “0” for Fireplace = True. The error comes in not understanding what either choice means for how you interpret the outcome of the regression.

If you chose “0” for Fireplace = True and did the multiple regression correctly, you came out with a beta2 coefficient of about -$5567 for the dummy variable.

If you did the opposite and used “1” for Fireplace = True, you found a beta2 coefficient of +$5567.

The error happens when you try to interpret these outcomes.

It is straightforward to interpret the outcome if you let “1” indicate the presence of a fireplace, Image 1. That is that having a fireplace adds about $5567 in value to the sales price of a typical house. You see this when you use the CI/PI worksheet to forecast home values by putting in either a “0” or a “1” in the calculator.

Image 1

But some students who used “0” to indicate the *presence of a fireplace* came up with an incorrect conclusion: that the *presence of a fireplace reduced* the price of a home because beta2 was negative.

In reality, for the students who used “0” to indicate the presence of a fireplace, the negative beta2 tells you that ** not** having a fireplace reduces the value of a home, just the opposite.

Image 2

Final thought: you may notice that the y-intercepts on the two coding methods are also different. But if you check, you will see that the y-intercept for Fireplace=True=0 is $5567 greater than the y-intercept of Fireplace=True=1. That makes sense because under that scenario, the starting point should be greater because the assumption is that the house has a fireplace.

Retraction Watch. (2017, Oct). *Boys will be boys: Data error prompts U-turn on study of sex differences in school*. Retrieved from Retraction Watch: http://retractionwatch.com/2017/10/17/boys-will-boys-data-error-prompts-u-turn-study-sex-differences-school/

]]>

You argue that statistical literacy gives citizens a kind of power. What do you mean?

What I mean is that if we don’t have the ability to process quantitative information, we can often make decisions that are more based on our beliefs and our fears than based on reality. On an individual level, if we have the ability to think quantitatively, we can make better decisions about our own health, about our own choices with regard to risk, about our own lifestyles. It’s very empowering to not be scared or bullied into doing things one way or another.

On a collective level, the impact of being educated in general is huge. Think about what democracy would be if most of us couldn’t read. We aspire to a literate society because it allows for public engagement, and I think this is also true for quantitative literacy. The more we can get people to understand how to view the world in a quantitative way, the more successful we can be at getting past biases and beliefs and prejudices. (Bleicher, 2017)

I just found an interesting article that suggests students who ask a lot of questions do better in courses. My own anecdotal evidence supports this idea. Try it. I think you will find Excelsior instructors ready and willing to help. ]]>

I know there is pressure, either self-inflicted or from external sources, to try to rush through your degree as fast as possible. For many, that means always taking 8-week term courses. In my experience in teaching introductory statistics, I have seen students do well in the 8-week terms, but I have seen too many students struggle in them. Perhaps, as I believe, statistics is “one” of those courses where time is required for the concepts and ideas to jell and firm up.

I stumbled across an interesting article while researching cognitive load and found this: “When you have nothing to think about, you can do your best thinking. You don’t even have to be in the shower.” (Baer, 2016)

In a related article, I found Stanford researcher Emma Seppälä saying:

We need to find ways to give our brains a break…. At work, we’re intensely analyzing problems, organizing data, writing—all activities that require focus. During downtime, we immerse ourselves in our phones while standing in line at the store or lose ourselves in Netflix after hours. (Seppälä, 2017)

Taking courses in the 8-week term format, especially if you take more than one at a time, can easily be a form of information overload. Moreover, the 8-week terms do not give you much freeboard if one of life’s frequent surprises shows up.

My “two cents” is that you should build-in time for your brain to recharge after work and studies. Time to be with your family and time to be alone. Taking the 15-week version of a course now and then may help give you that time to recharge. That is not a sign of weakness or selfishness.

That is being smart.

Baer, D. (2016, June 20). ‘Unloaded’ Minds Are the Most Creative. Retrieved from Science of Us: http://nymag.com/scienceofus/2016/06/unloaded-minds-are-the-most-creative.html

JSeppälä, E. (2017, May 8). Happiness research shows the biggest obstacle to creativity is being too busy. Retrieved from Quartz: https://qz.com/978018/happiness-research-shows-the-biggest-obstacle-to-creativity-is-being-too-busy/?utm_source=qzfb