The 4 movies every data scientist should watch and how to use our skills responsibly.

8 min readOct 13, 2021

Data science movies — Data Analytics movies — Image from Pixabay.

Did you come here expecting to find Money Ball, The Imitation Game, or Minority Report? This is going to be a bit different. While I don’t deny that the movie about Alan Turing, the father of computer science, is really good, in this article I want to talk about something else.

In this article, I will highlight the 4 movies that expose the tragic effects that data science, data analytics, and big data related technologies are having today on our lives and our societies when they are not used responsibly.

Seriously? Does data science have damaging effects on society? Yes, it does. Just a few weeks ago, the Wall Street Journal published a series of articles based on internal Facebook documentation, which revealed a number of data misuse-related problems occurring at Facebook. As it turns out, Facebook supports human trafficking networks on its platform; its ranking algorithm promotes misinformation; and Instagram, owned by Facebook, can be harmful to teenage girls (1, 2).

These problems are created by Facebook’s data collection practices, the way Facebook designs its ranking algorithms and the engagement metrics they use, together with a collection of (poor) decisions that put profit above people. If this is not data science, then what is it?

The Facebook Files, a Podcast Series

The Facebook Files, an investigative podcast series, dives into an extensive array of internal Facebook documents that…

www.wsj.com

Sadly, “The Facebook Files” are not the first report on data misuse and its damaging effects on our societies. Over the past few years, a number of books(3, 4), movies (5, 6, 7, 8), and organizations (9, 10) have begun to raise awareness of these issues, highlighting how the misuse of data combined with persuasive technologies, poor decision-making, and a lack of regulation are threatening our health, our livelihoods, and our societies.

Why does this matter to data scientists and programmers? We data professionals could be using our skills to help people and societies. There is certainly a lot to be gained from the use of predictive analytics related technologies. We could be helping teams, like in Money Ball. Or, we could inflict real damage through the data products that we create if we don’t use our skills responsibly. After all, companies are run by people, decisions are made by people, and algorithms and data products are designed by (some of) us, data scientists.

Why does this matter for data scientists? We, data scientists, could be using our skills to help people and societies. There is certainly a lot to be gained from the use of data related technologies. We could be helping teams, like in Money Ball. Or, we could inflict real damage through the data products that we create, if we don’t use our skills responsibly. After all, companies are run by people, decisions are made by people, and algorithms and data products are designed by (some of) us, data scientists.

Data science movies

The following 4 “data science movies” are documentaries, dramas or a blend of both, exposing the damaging effects of the misuse of big data and artificial intelligence on different aspects of politics, health and different sectors of society. They reveal why the products inflict the damage and how companies ended up creating those products. And in some cases, they also offer suggestions on how to move forward.

The Social Dilemma

The Social Dilemma is perhaps the most famous of all the documentaries, with 7 nominations and 2 Emmy awards this year, and it is available on Netflix. The Social Dilemma reveals how social media platforms create addiction and manipulate people’s opinions and behaviors while spreading conspiracy theories, fake news, and disinformation. These issues are created through the misuse of persuasive technologies aimed to capture users’ attention and to retain them on their platforms for as long as possible. Algorithms are designed to optimize the time users spend on the platform and/or the amount of content consumed, regardless of the quality or accuracy of that content.

According to the movie, social media platforms compete for users’ attention because, according to the movie, this way they can generate profit through the paid advertisements shown to the users.

The film features interviews with many former employees and executives of big tech companies, like Google, Facebook, and Twitter, who offer a first-hand look at what goes on in algorithm design. You can watch the trailer here.

Coded Bias

The film Coded Bias is a multi-award winning documentary that follows the journey of MIT Media Lab researcher Joy Buolamwini, who discovered that facial recognition algorithms actually do not properly recognize the faces of people of color and women. This finding shows that the artificial intelligence tools that we use, which are thought to predict real life as closely as possible, do not actually do such a good job. Or at least not for everybody. In other words, many of the artificial intelligence tools that we use today discriminate, are racist, or sexist. And this has dramatic consequences for various sectors of our society.

The film explores the reasons behind the bias and the consequences of this bias across different sectors of the population, highlighting that vulnerable sectors of society are hit the worst. You can watch the trailer here.

The Great Hack

The Great Hack is a documentary film about the Facebook-Cambridge Analytica data scandal, perhaps the biggest scandal before “The Facebook files” that we mentioned earlier. The film shows how Cambridge Analytica’s misuse of data and data analysis in targeted advertisement campaigns disrupted politics in various countries, including the Brexit referendum in the UK and the 2016 elections in the US. Perhaps unsurprisingly, the film exposes the relationship between Cambridge Analytica and the tech social media giant Facebook and how they shared users’ data without their awareness.

The film features a journalist from The Guardian who broke the story and a former employee of Cambridge Analytica who turned whistleblower. You can watch the trailer here.

Brexit: the uncivil war

Brexit: the uncivil war is a drama film based on the true story of the strategy behind the “Vote Leave” campaign ahead of the Brexit referendum that prompted the UK to leave the European Union. The film shows how the (mis)use of social media and the internet as a marketing tool, through micro-targeting advertisements, led to the separation of the UK from the European block. The film features an outstanding performance by Benedict Cumberbatch as the mastermind behind the campaign strategy. You can watch the trailer here.

Can we prevent the misuse of data technologies?

The movies highlight the damaging impact that the misuse of data, data analysis, and big data related technologies is having on our livelihoods and our societies today. These problems are generated thanks to the lack of regulation around the use of data, which allows tech companies to do pretty much what they want. So the solution seems simple: we need a whole lot more regulation to prevent the misuse of these technologies and incentivize their use towards the well-being of people and communities instead of, well, for profit.

Unfortunately, due to the enormous amount of money spent by big tech on lobbying (11, 12, 13, 14, 15), the implementation of new regulations is going to be a slow and painful process. In fact, Google, Facebook, and Microsoft are the three biggest lobbying spenders in the European Union (11, 13) and Facebook and Amazon are now the two biggest corporate lobbying spenders in the US (15).

So it seems, for the time being, it is down to us, data scientists and programmers, to try and do something to prevent the misuse of technology. After all, we are the ones who create these products. Aren’t we?

So, what can we do?

Most of the movies discussed here feature former employees of big tech companies who either raised their voices and were not heard, or decided to leave their company because they realized that the technologies they were creating were not serving people’s best interests. If we happen to be working towards creating damaging technologies, we could do the same. Let’s raise our voices, raise our concerns, and try to steer product design towards a useful purpose.

Some former tech employees went on to fund NGOs that fight for a humane use of technology, like the Center for Humane Technology, or for the accountable and equitable use of artificial intelligence, like the Algorithmic Justice League. Jump on to their websites and see how you can help.

In fact, the Center of Humane Technology has a lot of resources for people working on the design of data products to support us in building humane technology and help us steer the discussion within our companies. And speaking of steering discussions,

Let’s focus the discussion on data science and data analysis technologies.

You, like me, probably read tons of articles on the internet saying that “Data Science is the sexiest job in the 20s”, and that data science, machine learning, and software engineering pay some of the highest salaries in the market. And I am not sure about you, but when I talk to (most of) my colleagues, it all seems to revolve around money, around using the latest technologies, and around working for some of these “prestigious” (big tech) companies. Those seem to be the hallmarks of success. And very rarely do I hear discussions or reflections on whether what we do is actually meaningful or even useful at all.

So let’s steer the conversation. Let’s change the way we talk about data science. Instead of attaching success to the size of the pay check or the company we work for, let’s think for a moment if what we are going to build with our skills in that company and for that money is actually useful. Are the data products improving society’s well-being in any way? Are we helping prevent crime? Improving health? Promoting positive interactions?Are our algorithms fair?

And if we are not happy with the answers, why not just kindly decline that job offer and move on?

I think it is important for us, data scientists, those who actually have the skills to create products that consume data and make decisions based on data, to understand that not every product is a good product, and that some can indeed be very harmful. We do have a say, and we are therefore responsible for what we create. And we have this unique opportunity to create products that serve people and communities and to abstain from creating products that are damaging or ethically irresponsible.

Until regulation on how data and algorithms can and can’t be used starts to emerge, and given the amount of money the big tech companies spend on lobbying against regulation, that is going to take a while. It is down to us, data scientists, machine learning engineers, and those who actually have the skills to create those products, to use our skills responsibly and say, “no, I don’t want to be part of this,” and put our skills to better use.

Are you on board?