Back in the dark ages when access to computers was not all that common, I was faced with developing a project schedule for, to me, a complex construction project. I was not that long out of school, so I sought out my boss with the hope he would give me some guidance on how to approach the problem.
He told me to use three-point estimation and to talk to some of the older engineers in the firm to get their ideas on the likely outcomes. So, I did and learned that the three points he was talking about were the worst case, the best case, and the most likely case for what would happen during the project. (Wikipedia, n.d.)
He also directed me to consider using PERT. I did and learned that form of project management scheduling including consideration of the optimistic time estimate (o), the most likely or normal time estimate (m), and the pessimistic time estimate (p). In PERT, instead of using probabilities for each estimate of the time required, the task time is calculated as (o + 4m + p) ÷ 6. (Taylor Jr., 2011)
To model a three-point estimate with a probability distribution you need to use a triangular distribution. Today, three-point estimates are commonly used in business and engineering, so it is somewhat surprising that Excel does not have a built-in function to help. I was recently faced with this dilemma in my quantitative methods course which I am trying to migrate away from expensive software solutions.
We were working on modeling business problems using the Monte Carlo method. In the example “Make vs Outsource” problem I wanted to use, the demand for the new SSD (solid state drive) in our case study was forecast to have a worst case, best case, and most likely value.
This is the basic model:
How could we model this demand growth as a random variable using a triangular distribution for use in a Monte Carlo simulation?
As you begin to look at a triangular distribution, there is nothing more than basic geometry and algebra required.
In the image above, a is the minimum value, b is the maximum value, and c is the most likely value, the mode. The probability distribution represented by the area in the larger triangle is continuous and, of course, equal to 1.
Recall the area of a triangle is ½ * base * height. Since the area = 1, 1 = ½ * (b-a) * h. Rearranging, we get
h = 2/(b-a).
Looking at the smaller yellow triangle, A1, that area would represent the cumulative probability distribution of x > a and x <= c, the most likely value.
P(a < x <= c), then, is equal to ½ * (c-a)*h = ½ * (c-a) * [2/(b-a)], or, with a bit of rearranging,
P(a < x <=c) = (c-a)/(b-a).
If we set a < x < b, we can then say that the cumulative probability x <= c, the most likely value,
P(<=c) = (c-a)/(b-a).
Let’s look at an example of three-point forecast increases in demand for a product:
Worst case increase = 2%; best case increase = 10%, and most likely case increase = 7%.
Thus, a = 2%, b = 10%, and c = 7%. The probability x being less than or equal to the most likely value for demand increase of 7% is
P(<=c) = (7-2)/(10-2) = 5/8 = 0.625.
Let’s look at a more general case.
For the case of x1, a < x1 < = c,
For x2, c < x2 < = b,
And again, by some algebraic manipulation, we can find the cumulative probability distributions:
So, these two equations will give the probability for an x for a triangular distribution, but how do we use them in a Monte Carlo simulation where we want to randomize the x values for a triangular distribution?
For variables that follow a normal distribution, we can use the Excel RAND function to generate probabilities and, with the NORM.INVERSE, to then generate random values of x (see image 1 for an example). So, to generate random values of x that follow a triangular distribution, we need to develop an inverse of the two CDF formulas above.
To do that, we can generate random probabilities (P1 and P2) using the RAND() function and then set them equal to the CDF for each of the two equations. Then use algebra to solve for x for each of the two cases.
For the first part:
Now, put them into Excel format and use an IF statement to pick the one to use. To do that we will use the cumulative probability that x = c as the decision point.
That is P(c ) = (c-a)/(b-a)
If P(x) <= P(c ), use the equation for x1, else, use the equation for x2.
Here is the implementation in Excel. I then link cell B6 into the Make vs Buy model for the demand and conduct the Monte Carlo simulation. the red and blue colors refer back to the two equations developed above for x1 and x2.
To see how this works, I ran a 5000 trial simulation and plotted a histogram of the x-values generated.
I think that is pretty good. And all done using basic Excel – no expensive add-ins.
You can download a copy of the calculator here. Triangular_Distribution_X-value_Calculator_Dawn_Wright_PhD
Petty, N., & Dye, S. (2013, June 11). Triangular Distributions. Retrieved from Statistics Learning Center: https://learnandteachstatistics.files.wordpress.com/2013/07/notes-on-triangle-distributions.pdf
Taylor Jr., J. (2011, July 6). The History of Project Management: Late 20th Century Process and Improvements. Retrieved from Bright Hub Project Management: http://www.brighthubpm.com/methods-strategies/11663-late-20th-century-process-and-improvements-in-pm-history/
Weissteain, R. (n.d.). Triangular Distribution. Retrieved from Wolfram Mathworld – A Worfram Web Resource: http://mathworld.wolfram.com/TriangularDistribution.html
Wikipedia. (n.d.). Three-point estimation. Retrieved from wikipedia: https://en.wikipedia.org/wiki/Three-point_estimation