F-test

F-test Critical Area

I teach graduate statistics, otherwise known as quantitative methods/analytics. I have found a lot of good statistics tutorials on You Tube and have recommended some to my students. Because these students are MBA students and not Ph.D. students, I am more concerned with my students leaving the course knowing how to use technology to help them solve business problems than with their having an in-depth knowledge of statistical theory or memorizing equations. There are many technology tools, i.e. statistical software, available, some free and some expensive for an individual to buy. Fortunately, most MBA students become at least familiar with Excel during their MBA if they were not already familiar with it. Moreover, many, if not most, companies use

Microsoft Office as their basic IT tools. So I have moved my recommendation to my students away from more dedicated (and expensive) programs such as SPSS and toward Excel.

So it was not unusual for me to look for good tutorials on You Tube for the F-test using Excel. And I found many. Unfortunately, all I found were focused on cases where raw data is available. This is great and I pointed my students to some of them. But I could not find one that would help them for their homework and exams in our auxiliary course management system where most problems give them intermediate data such as means and standard deviations for samples. So I decided to write/make one.

I am going to use Camtasia to capture my screen while I solve several problems using Excel. But it is now necessary to include closed captioning for materials used in most online courses and I need to do that. Which brings me to this, my draft script for solving F-test problems using Excel’s built-in functions. I’ll of course have to augment this with the actual words I record but Camtasia has a tool for that (thank goodness).

Soo with that long-winded introduction…Dawn on solving F-test problems with Excel.

The F-Test using Excel

When we want to compare the variance of two populations, we use the F-test. In the F-test, we make one important assumption that results in a ‘simplified’ calculation process. This important assumption is that the null hypothesis is true. But I bet that does not surprise you.

The F-test statistic we use is simply the ratio of the two population variances: F = s12/s22. If the null hypothesis is true, the ratio will be equal to 1; the variances are equal. But we have specific requirements for calculating the F-statistic. The larger sample variance is always s1, the numerator; the smaller sample variance is always s2, the denominator. This means the F-statistic has values greater than 1.

I need to regress a bit and talk about the F-distribution (figure 1). The F-distribution is named after, as you might guess, a famous statistician R.A. Fisher – the Fisher F probability distribution. As you can see, the F-distribution ranges from 0 to infinity with a peak value a little less than x = 1. For our purposes, we are most interested in the area above x = 1, but the smaller values are important and we will look at that a bit later in this discussion.

Figure 1: F-distribution

F-test

F-test Critical Area

It is worth remembering that the F-distribution is really a family of distribution curves that vary according to the degrees of freedom of the numerator and denominator in our equation for F. A specific F-distribution is denoted as F(ndf, ddf) where ndf is the numerator degrees of freedom and ddf is the denominator degrees of freedom, e.g. F(3,8). Important: F(3,8) is not equal to F(8,3).

To find our critical value of F, Fcrit, we need another determinant, the value of probability, our alpha. With these three factors, we can use F-tables and I’ll leave that process to others to explain, e.g. here. Here we are interested in using ‘technology’ to conduct our F-test, specifically Excel.

When we test hypotheses using the F-test, there are our usual suspects for our alternative hypothesis: >, <, and ≠. These, of course, are right-tailed, left tailed, and two-tailed tests.

Figure 2: F-statistic Critical Areas.

 

F-test critical areas

F-test Critical Areas

Graph credit: Flatworldknowledge.com/shafer

One carry over from F-tables, however, is that all F calculations in Excel are for right tail tests of hypothesis, i.e. s12>s22. (Remember we have defined F such that the larger value of the variances is always the numerator; this forces the test into the right tail). Because the F-distribution is not symmetrical and there are no negative values, you cannot just take the right critical value and use the opposite of it as the left critical value. And what do we do about two-tailed tests?

Without going into a detailed explanation “why,” to find the left critical value, we reverse the sequence of degrees of freedom, i.e. F(dnf, ndf), and then take the reciprocal of this value. For example, the critical value with α= 0.05 with 10 numerator degrees of freedom and 8 denominator degrees of freedom is equal to the reciprocal of the critical value for α= 0.05 with 8 numerator degrees of freedom and 10 denominator degrees of freedom.

For a two-tailed test, we only use the right-tailed value of F based upon a probability of α/2 and then take its reciprocal.

Excel, depending upon the edition, has several F functions available. I’ll show you both Excel 2010 and Excel 2013. I think Excel 2007 is similar to 2010; I’m going to punt on older versions.

When we are trying to solve for critical values of F with Excel 2010, we use FINV. In Excel 2013, we use either F.INV.RT or F.INV. Note, Excel 2013 still has the FINV function but I suggest you use the newer functions as they have been refined and are a bit more accurate. (I’m assuming you know how to use functions in Excel, but if you don’t there are a number of good tutorials online – just google “how to use Excel functions.”).

Our basic right-tailed test uses F.INV.RT or FINV. Both have similar arguments: (p, v1, v2) where p is your alpha, v1 and v2 are the numerator and denominator degrees of freedom respectively.

For a left-tailed test, we use the same functions but the arguments are (p, v2, v1) and we then take the reciprocal of the value returned by the functions.

For a two-tailed test in Excel 2013, we use the F.INV function; in Excel 2010, we still but use the FINV function. In both cases, the arguments are (p, v1, v2) but this time p = α/2.

Find p given a value for F

One neat advantage to using Excel for F-tests, we can solve for any value of p if we are given a value of F. However, for homework and test purposes, I suggest you use this “technology” from Daniel Soper’s great website here. It’s a lot quicker. Thanks Dr. Soper!