Reading Time: 6 Minutes ReadAlthough I work at a tech company, I have no technological background whatsoever. I graduated in linguistics from the Department of Humanities, meaning I had to sit through compulsory philosophy and ethics classes. It got me thinking: how am I going to use all this? And then it hit me. There is no better time to talk about Machine Learning (ML) ethics than now. ML is spreading, and however useful, it must also be treated with caution. Let’s see why.
Machine learning – guilty or innocent?Whether you are aware of it or not, Machine Learning is already affecting our daily lives. You’ve probably come across ML in some form. Spotify’s Discover Weekly? Netflix’s movie recommendations? Amazon’s book picks? But what happens if you watch a movie that Netflix recommended for you and you don’t like it? Well, you lose a few hours of your life. Annoying, but no biggie, right? But there are some areas where ML gone wrong can have serious consequences which can affect people’s lives. One infamous case where an ML algorithm made choices with questionable results is when Amazon introduced ML into their human resources processes. The main idea was that the ML algorithm would screen a number of CVs, and leave only the top applicants for human evaluation. However, after using the system for some time, it turned out that the system started discriminating against women: a given CV got negative points when it mentioned some kind of all female association, or if the candidate graduated from an all-female school. In the criminal justice system, facial recognition has been used for a long time by the police. Now ML is being used to determine how likely a criminal is to reoffend. What’s wrong with all this? Well, if the algorithm evaluates a person based on the wrong input, then it can lead to a biased decision. This is exactly what happened – the system ended up discriminating against African-American people. These examples all show that ML ethics has never been more important, so it is crucial that you get it right if you use it when making decisions. But how?
Machine learning biasFirst of all, you must be aware that Machine Learning is often referred to as a ‘black box.’ This means that although developers and data scientists may be able to create an algorithm, the inner workings of ML are not quite clear for us humans. This is also why the ML bias or otherwise known as algorithm bias (which is a crucial concept of ML ethics) is hard to grasp. What we do know is that ML bias comes from the collection or the use of data. Similarly to bias in the traditional sense, when it occurs, the system draws inaccurate conclusions based on the set of data it uses. But functions and algorithms cannot be sexist or racist, right? Then how can bias still creep into Machine Learning? Well, machines and ML algorithms are built by humans – who, by nature, can be judgmental and biased. Now let’s see what exactly Machine Learning bias is.
Types of bias and how to get rid of themThere are many classification methods available when it comes to Machine Learning bias. For the sake of simplicity, I’m only going to mention the two most important and most common types. They both have different sources, and the solution as to how to prevent the two types also requires different measures.
Type #1: The devil is in the… data: pre-existing biasesPre-existing biases (or dataset biases) are not the result of coding itself; actually, they have little to do with ML algorithms at all. The point of pre-existing bias is that it is not a result of a bad system, but it exists independently of the system. And it is simpler than you think. One common example is any application where you have to select your gender, and only male or female are available. Some part of the data set is missing, and the system ignores the fact that some people identify themselves as something other than these two categories. Is this a result of bad code? Not really. According to Packt, “it’s true that just about every data set will be ‘biased’ in some way” because the data is only a representation of something. What you can do about it is to make sure that your data is as accurate as possible, and it represents what you actually intend for it to represent in the clearest manner possible. Also, you have to be aware of the extent of bias and what effect it may have on your ML algorithm.
Type #2: Technical biasesAccording to Wikipedia, technical bias creeps into the system via limitaitons of a “program, computational power, its design, or other constraint on the system.” For example, when a random generator is not capable to produce true randomness (which is still one of the biggest challenges of computer science), then the results may be biased also. Bias originating in design occurs for example in search engines, where there’s only a number of results on a page. This way, the algorithm “privileges” the top number of results as opposed to the rest. Technical bias is about how the algorithm itself is developed or how the model was trained. An interesting subtype is contextual bias.
“It occurs when the programmed elements of an algorithm fail to properly account for the context in which it is being used. A good example is the plagiarism checker Turnitin – this used an algorithm that was trained to identify strings of texts, which meant it would target non-native English speakers over English speaking ones, who were able to make changes to avoid detection.” (Packt)
How to get rid of Machine Learning BiasAny time when working on a new ML model, people tend to only think about the implementation and the architecture. However, nowadays it is also crucial to keep the ethical implications in mind. Society and humanity are rapidly changing, and neither people nor technology should ignore how these changes are affecting our lives. Whenever working on an ML project, consider these points first:
- The aim of the algorithm: What would you like your ML algorithm to achieve? Set clear goals. If your goals are too simple to define, you may want to work on them a bit more.
- Real-life implications of your ML model: This is where things get tricky, especially from an ethical point of view. When developing a solution, keep your goal in mind. That is, how, and more importantly, why do you want to use it? Think about what your algorithm will achieve in the real world.
- Eliminate pre-existing biases as much as you can: Make sure that the data collection methods and the data itself are in line with the goals of the algorithm and what you want to achieve with your ML solution. Collect as much data as you can. Crosscheck and document everything during the data collection stage to eliminate bias as much as possible.
“The technique achieves state-of-the-art results in several common fairness tests, while presenting relatively low error rates.”