Missing data? Survive Survivorship Bias with Qlik
How come some airplanes don’t return from the battlefield? Are the success stories of Bill Gates, Jeff Bezos and Mark Zuckerberg the best learning experiences? And how could people in 1987 think that cats were more likely to survive if they fell from a higher floor? All these questions have one factor in common: they suffer from “survivorship bias”.
WHAT IS SURVIVORSHIP BIAS
If you work a lot with data, this might be a familiar term. Survivorship bias is the phenomenon in which results (or survivors) of a process are treated disproportionately. Incomplete data sets, lack of context or incorrect interpretation of data is often the basis of this misconception. If you understand why survivorship bias occurs and you recognize the effect, it will help you better understand your data and make your analyzes more reliable and valid. In recent history we find numerous examples of this phenomenon, it has affected scientists, entrepreneurs, and researchers, among others.
WHAT DOES A FAIL HAVE TO TELL?
In the book “The Black Swan: The Impact of the Highly Improbable”, Nassin Taleb writes: “The cemetery of failed restaurants is very silent.” But focusing only on success and not looking into the fails will make you miss out on the full scope of your data and not really find understanding of how your processes actually function.
Success stories of entrepreneurs are often used as examples of how things should be done, but in addition to those few success stories, there are a multitude of entrepreneurs who don’t make it. Bill Gates (Microsoft), Jeff Bezos (Amazon) and Mark Zuckerberg (Facebook) are indeed successful in their businesses, but only have one side of the story to tell: how they made it and achieved their success. Many others who may have taken the exact same steps, have the exact same talent and also have shown 100% ambition have failed to make it – and their story is perhaps even more interesting. They can tell you what happened and what caused them to fail. These stories often contain wisdom from which we can deduce why things go wrong, why we fail. Focusing only on the “survivors” will stop you from getting the overall view and finding the flaws in your processes.
“The cemetery of failed restaurants is very silent.” – Nassin Taleb
FALLING CATS – IT’S EASY TO MISS THE BIG PICTURE
Another example of missing the big picture arose in 1987: a group of scientists investigated the likelihood that cats would survive a fall from a certain floor. The researchers based their conclusions on data obtained from veterinary clinics. These data were highly remarkable: the researchers noted that the higher the fall, the greater the chance of the cats survival. In fact, 100% of the cats that had fallen from the sixth floor or higher survived their fall. According to the researchers, this was possible because the cats achieved the maximum fall speed during such a fall, relaxed and then prepared for landing, resulting in a better chance of survival.
The Straight Dope Newspaper disproved this theory 10 years later. In this case there is a definite problem with survivorship bias: the researchers only found data from cats that actually had been treated at veterinary clinics. As there was no information in their data of cats that had fallen from higher floors, the researchers assumed that these cats survived their falls unscathed. However, the circumstance was of course the opposite: these cats died immediately as a result of their fall and were therefore never treated at the veterinary clinics. Resulting in them not being registered and never being part of the data-set.
AIRPLANES DURING WWII – UNCOVERING THE HIDDEN TRUTH
It is 1943: large parts of Europe are occupied by German troops. The allies are trying to get through the enemy’s defense system using airplanes with bombs, but without further success – many planes are shot down and lost. The Center for Naval Analyzes starts looking for a way to reinforce the bombers. To ensure that the aircraft still can take off, the entire machine can’t be reinforced with an extra layer: it’s necessary to choose which parts should have additional armor installed. While the experts from the Center for Naval Analyzes note where the returning planes are most affected, the Statistical Research Group (SRG) of Columbia University is called in.
It’s Abraham Wald, who fled to the U.S in 1938 during the upcoming of the German troops, who comes up with an unexpected conclusion – reinforce the planes where the machines aren’t hit. Wald comes to this finding by stating that planes returning are hit in non-fatal spots: they can return despite damage. The planes hit in other places apparently don’t make it, and that’s why, according to Wald, it’s better to apply armor to these parts of the plane The advice is followed and thanks to the statistical approach of the problem by Wald, the allies gain ground.
“The extra armor belonged not on the part of the plane that could survive a lot of bullets, but to the part of the plane that couldn’t.” – Abraham Wald
QLIK SENSE MAKES YOUR DATA TRANSPARENT
The cognitive engine of Qlik will help you prevent survival bias. In the image above, all types of Hole Location are selected (green), except “No Holes” (light gray). Qlik clearly shows which selection options in Plane and Status are still available (white) and which are not (dark gray). This selection in Hole Location shows that all airplanes with the status “Shot Down” fall outside the dataset. In other words: airplanes with the shown damage return and this damage proves therefore not fatal. Qlik ensures that you don’t miss any data: by using different colors it becomes very clear what is and what isn’t part of the (selected) data-set. This way you won’t overlook anything during your analysis!
Writer: Ronan Berendsen – BI Consultant Climber
Mangel, M., & Samaniego, F. J. (1984). Abraham Wald’s work on aircraft survivability.
Wald, A. (1980). A Reprint of’A Method of Estimating Plane Vulnerability Based on Damage of Survivors (No. CRC-432).
Qlik Data Integration and Talend
Qlik have acquired Talend, an industry leading Data Management Platform. With Talend, Qlik brings a new approach, offering a full range of best-in-class capabilities, helping customers eliminate technical debt and cost while increasing enterprise confidence that trusted data is available for decision making when it matters most.>> Read more
Drive results in retail with Qlik
Sign-up to our ‘Driving results in retail’ on-demand webinar to learn how Qlik is delivering insights that are informing decision-making and helping to keep stock on the shelves.>> Sign me up!
Qlik in 2023 Gartner Magic Quadrant for Analytics and Business Intelligence Platforms
Get your free copy for an overview of the entire BI landscape and see why Qlik is recognized as a Leader in the Quadrant for the 13th year in a row.>> Download the report