Although I work at a tech company, I have no technological background whatsoever. I graduated in linguistics from the Department of Humanities, meaning I had to sit through compulsory philosophy and ethics classes. It got me thinking: how am I going to use all this? And then it hit me. There is no better time to talk about Machine Learning ethics than now. ML is spreading, and however useful, it must also be treated with caution. Let’s see why.
Whether you are aware of it or not, Machine Learning is already affecting our daily lives. You’ve probably come across ML in some form. Spotify’s Discover Weekly? Netflix’s movie recommendations? Amazon’s book picks? But what happens if you watch a movie that Netflix recommended for you and you don’t like it? Well, you lose a few hours of your life. Annoying, but no biggie, right? But there are some areas where ML gone wrong can have serious consequences which can affect people’s lives.
One infamous case where an ML algorithm made choices with questionable results is when Amazon introduced ML into their human resources processes. The main idea was that the ML algorithm would screen a number of CVs, and leave only the top applicants for human evaluation. However, after using the system for some time, it turned out that the system started discriminating against women: a given CV got negative points when it mentioned some kind of all female association, or if the candidate graduated from an all-female school. In the criminal justice system, facial recognition has been used for a long time by the police. Now ML is being used to determine how likely a criminal is to reoffend. What’s wrong with all this? Well, if the algorithm evaluates a person based on the wrong input, then it can lead to a biased decision. This is exactly what happened – the system ended up discriminating against African-American people. These examples all show that Machine Learning ethics has never been more important, so it is crucial that you get it right if you use it when making decisions.
But how?
First of all, to understand the Machine Learning ethics, you must be aware with Machine leaning bias and what ML basically is.Machine Learning is often referred to as a ‘black box.’ This means that although developers and data scientists may be able to create an algorithm, the inner workings of ML are not quite clear for us humans. This is also why the Machine Learning bias or otherwise known as algorithm bias (which is a crucial concept of MachineLlearning ethics) is hard to grasp. What we do know is that Machine Learning bias comes from the collection or the use of data. Similarly to bias in the traditional sense, when it occurs, the system draws inaccurate conclusions based on the set of data it uses. But functions and algorithms cannot be sexist or racist, right? Then how can bias still creep into Machine Learning? Well, machines and ML algorithms are built by humans – who, by nature, can be judgmental and biased. Now let’s see what exactly Machine Learning bias is.
There are many classification methods available when it comes to Machine Learning bias. For the sake of simplicity, I’m only going to mention the two most important and most common types. They both have different sources, and the solution as to how to prevent the two types also requires different measures.
Machine Learning pre-existing biases (or dataset biases) are not the result of coding itself; actually, they have little to do with ML algorithms at all. The point of pre-existing bias is that it is not a result of a bad system, but it exists independently of the system. And it is simpler than you think. One common example is any application where you have to select your gender, and only male or female are available. Some part of the data set is missing, and the system ignores the fact that some people identify themselves as something other than these two categories. Is this a result of bad code? Not really.
According to Packt, “it’s true that just about every data set will be ‘biased’ in some way” because the data is only a representation of something. What you can do about it is to make sure that your data is as accurate as possible, and it represents what you actually intend for it to represent in the clearest manner possible. Also, you have to be aware of the extent of bias and what effect it may have on your ML algorithm.
According to Wikipedia, technical Machine Learning bias creeps into the system via limitaitons of a “program, computational power, its design, or other constraint on the system.” For example, when a random generator is not capable to produce true randomness (which is still one of the biggest challenges of computer science), then the results may be biased also. Bias originating in design occurs for example in search engines, where there’s only a number of results on a page. This way, the algorithm “privileges” the top number of results as opposed to the rest. Technical bias is about how the algorithm itself is developed or how the model was trained. An interesting subtype is contextual bias.
“It occurs when the programmed elements of an algorithm fail to properly account for the context in which it is being used. A good example is the plagiarism checker Turnitin – this used an algorithm that was trained to identify strings of texts, which meant it would target non-native English speakers over English speaking ones, who were able to make changes to avoid detection.” (Packt)
Any time when working on a new ML model, people tend to only think about the implementation and the architecture. However, nowadays it is also crucial to keep the ethical implications in mind. Society and humanity are rapidly changing, and neither people nor technology should ignore how these changes are affecting our lives. Whenever working on an ML project, consider these points first:
One example of a technique by which you can eliminate bias was presented in a paper by Google: they suggest that a biased dataset should be viewed as an unbiased set which had been manipulated by a biased agent. The technique helps re-evaluate the data in order to make it fit a (theoretical) unbiased dataset. Only then is it fed into the ML algorithm.
“The technique achieves state-of-the-art results in several common fairness tests, while presenting relatively low error rates.”
OK, let’s assume then that you managed to come up with an ML system which is not biased in any way. Now what? You must also think about the future. There is still a chance that as the machine ‘learns,’ it will pick up features based on which it starts discriminating among data (or people for that matter). Chief Executive introduced a nice term for this: the concept of Day 13. It comes from the idea that after spending a long time testing and optimizing an ML solution, when you put the system into practice (on Day 1), everybody is happy, and that’s it – job done. And that’s where you can get it very wrong. Look at your solution on Day 13 instead. Is it still running? Is it working the way it’s supposed to? These are the questions you and your team need to focus on, and not only on the day of the release itself. The key to a successful Machine Learning solution is not that it’s perfect on Day 1. Instead, you need to make sure that it is still perfect on Day 13 (or Day 406, or whichever day for that matter – the point of the concept is that the amount of time passed since the release doesn’t have any significance whatsoever).
As you can see from the examples mentioned, even with the purest of intentions you can create an algorithm which is going to be biased in some way. And even if you cannot nip Machine Learning bias in the bud, there are ways to continuously improve the system and still come up with an ML solution which represents your data in an appropriate manner and gets exactly the right information in the right format, without any judgement issues. Ethics in ML is a super-interesting (and seemingly difficult to grasp) interdisciplinary area. But once you get the basics right, there is no need to be afraid of technology. It was designed to help humans – which it can, and it does. Machine Learning ethics becomes an issue when an algorithm is released on the ‘real world.’ Once you’re ready to keep up with it, only then is it – job done.