Earlier this year, I read of the analysis of the recent fatality caused by an Uber autonomous car in Arizona. (Lee, 2018) It occurred to me (and others) that this terrible situation is possibly related to Type I and Type II errors. (Efrati, 2018)
A Type I is a false positive where a true null hypothesis that there is nothing going on is rejected. A Type II error is a false negative, where a false null hypothesis is not rejected – something is going on – but we decide to ignore it.
In this case, the software designers were trying to optimize the ability of the car’s autonomous systems to recognize humans and other obstacles so that the car did not slow or stop too often due to things like lane pylons, trash in the gutter, or street signs on the side of the road. If that happened, the car would have a jerky, uncomfortable ride and slower average speed. Recognizing a lane pylon or sign pole as an obstacle to the car would be a false positive, a Type I error. (O’Kane, 2018)
Understand this is fuzzy, complex science and the system designers had to integrate the information from multiple sensors. One sensor might “see” a data point as an object better in the dark than another. Still another system might “see” an object and project the object’s path to be stationary while still another system might calculate a motion into the car’s path. Most of the systems use artificial intelligence that had to be trained using thousands of “images” that may or may not be similar enough to this victim walking a bicycle. Logically, some overarching system has to evaluate all the data and make a decision that there is an object that is an obstacle in a threatening path or position and initiate evasive or braking actions.
Again, if the “master” system was too conservative in its decisions and registered “positive” for objects that were not threats, it would produce an unnecessarily jerky ride. My assumption is that the system designers set the detection program’s parameters to make false positives less likely, to ignore an object until the system was very sure it was a real obstacle to the car. At the same time, the system needed to not have false negatives, failing to recognize objects that were a threat. Finding the optimal setpoint is akin to picking a significance level.
Trying to minimize false positive is analogous to decreasing the alpha, the level of significance, so it is more difficult to reject the null that “nothing is going on here.” Because Type I and Type II errors are “connected,” as you make it more difficult to have a false positive, Type I error, you also make it more likely you will have a false negative, a Type II error.
After much review, Uber now says the autonomous system did “see” the pedestrian and first classified her as an unknown object at 6 seconds to impact. It then thought she and her bike was a car, and finally recognized her as a person about 1.3 seconds before impact. By then it was too late for the normal control system to react and Uber had disconnected the Volvo emergency braking system. For testing, the company had backup human observers who were supposed to recognize mistakes and take appropriate action. That backup failed at a crucial time and the autonomous system was on its own. (NTSB, 2018)
One way of thinking about this is that the sensors in the car actually detected the woman who was struck but “decided” initially the data was not sufficiently strong to register as a real obstacle and stop the car. The p-value it calculated, if you will, was greater than the alpha the system designers chose, and the system did not reject the null that “there is nothing of concern happening here.”
But the system continued to collect data, increase the sample size n so to speak, until the test statistic crossed into the rejection area, though that was too late to save the pedestrian.
The system initially made a Type II, false negative, decision and failed to reject the false null that the “object is not real.” (Marshall, 2018)
The Key Takeaway
My point is that when you decide on your level of significance in the real world, you must consider the costs of a mistake either way. You must evaluate the consequences of making a Type I, false positive decision or making a Type II, false negative decision and set your significance level appropriately.
Efrati, A. (2018, May 7). Uber Finds Deadly Accident Likely Caused By Software Set to Ignore Objects On Road. Retrieved from The Information: http://bit.ly/2KquXbH
Lee, T. (2018, May 7). Report: Software bug led to death in Uber’s self-driving crash. Retrieved from ARS Technica: https://arstechnica.com/tech-policy/2018/05/report-software-bug-led-to-death-in-ubers-self-driving-crash/
Marshall, A. (2018, May 29). FALSE POSITIVES: SELF-DRIVING CARS AND THE AGONY OF KNOWING WHAT MATTERS. Retrieved from Wired: https://www.wired.com/story/self-driving-cars-uber-crash-false-positive-negative/?mbid=social_twitter
NTSB. (2018). Preliminary Report Highway HWY18MH010. Washington D.C.: National Transportation Safety Board.
O’Kane, S. (2018, May 7). Uber reportedly thinks its self-driving car killed someone because it ‘decided’ not to swerve. Retrieved from The Verge: https://www.theverge.com/2018/5/7/17327682/uber-self-driving-car-decision-kill-swerve