Podcast: Lying about Lying - and faking data in Excel
by meep
I go through the story of some social science experiments on dishonesty that seem to have some data that have been altered from their original form. I first wrote about these experiments back in 2013, then had an update in 2021 when data scientists at Data Colada found altered data from one of the studies, and now they have found further evidence of data falsification in 2023. They use both aspects of the altered data to detect it as well as ways Microsoft Excel works to find the falsified data.
Episode Notes
Data Colada links
17 Aug 2021: 98 Evidence of Fraud in an Influential Field Experiment About Dishonesty
Interlude: Calibri and Cambria
Perhaps the most peculiar feature of the dataset is the fact that the baseline data for Car #1 in the posted Excel file appears in two different fonts. Specifically, half of the data in that column are printed in Calibri, and half are printed in Cambria. Here’s a screenshot of the file again, now with a variable we added indicating which font appeared in that column. The different fonts are easier to spot if you focus on the font size, because Cambria appears larger than Calibri. For example, notice that Customers 4 and 5 both have a 5-digit number in “baseline_car1”, but that the numbers are of different sizes:
17 June 2023: 109 Data Falsificada (Part 1): “Clusterfake”
Two summers ago, we published a post (Colada 98: .htm) about a study reported within a famous article on dishonesty (.htm). That study was a field experiment conducted at an auto insurance company (The Hartford). It was supervised by Dan Ariely, and it contains data that were fabricated. We don’t know for sure who fabricated those data, but we know for sure that none of Ariely’s co-authors – Shu, Gino, Mazar, or Bazerman – did it [1]. The paper has since been retracted (.htm).
That auto insurance field experiment was Study 3 in the paper.
It turns out that Study 1’s data were also tampered with…but by a different person.
That’s right:
Two different people independently faked data for two different studies in a paper about dishonesty.
20 June 2023: 110 Data Falsificada (Part 2): “My Class Year Is Harvard”
As mentioned above, students in this study were asked to report their demographics. Here is a screenshot of the posted original materials, indicating exactly what they were asked and how:
….A less reasonable response is “Harvard”, an incorrect answer to the question. It is difficult to imagine many students independently making this highly idiosyncratic mistake. Nevertheless, the data file indicates that 20 students did so. Moreover, and adding to the peculiarity, those students’ responses are all within 35 rows (450 through 484) of each other in the posted dataset:
Books by two of the (possibly fraudulent) researchers
Rebel Talent: Why It Pays to Break the Rules at Work and in Life by Francesca Gino
Rebel Talent: Why It Pays to Break the Rules at Work and in Life by [Francesca Gino]
The Honest Truth About Dishonesty: How We Lie to Everyone—Especially Ourselves by Dan Ariely
The Honest Truth About Dishonesty: How We Lie to Everyone—Especially Ourselves by [Dan Ariely]
My actuarial articles
Feb 2013, The Stepping Stone: Everybody Cheats, at Least Just a Little Bit – A Review of The (Honest) Truth About Dishonesty, by Dan Ariely
Like many forms (especially tax), one generally fills out an insurance application, and then, at the end, signs a statement attesting that all the information above is truthful. Ariely wanted to check the effect of having people sign such a statement before filling out an auto insurance application. Specifically, he looked at the part of the self-reporting miles driven per year (as more miles driven results in higher premiums, usually). The result? Those with the form with the attestation at the top reported, on average, 2400 fewer miles driven than those who had the attestation in the standard place: the bottom. This was about a 9 percent reduction, which reflects the marginal aspect of “normal” cheating.
Or maybe that was the level of data falsification they thought they could get away with.
November 2021, The Stepping Stone, Distrust and Verify
However, I have learned to be a little more suspect of such “one simple trick” methods for making insurance easier.
In the age of InsurTech and increased risks in a pandemic, a bit of skepticism in the face of gee-whiz claims is useful. It does feel a little like righteous thinking “what myopic management!” and I should have distrusted the result just due to my own biases. Here I thought I was being a sophisticate, and I was just as gullible as people who believe clickbait articles.
One thing this did cement for me is the importance of our own professional standards as actuaries. If an actuary were involved in falsifying this data,11 there would be consequences in terms of the danger of having one’s credentials suspended or being expelled entirely from actuarial organizations. Actuaries’ credibility has been hard-won through a long history of not only developing better practices, and better standards, over time, but also policing those standards.
In light of those standards, I must respectfully retract part of my old article.
And I must remember to be more skeptical in the future.
Related STUMP posts
NOVEMBER 24, 2021
Covid, Ghostbusters, and Fraudulent Data
MAY 23, 2022
Don’t perpetrate financial fraud in spreadsheets