Inside the black box of artificial intelligence

Black box algorithms give answers but no explanations. Here is how they work and when it is better to use 'transparent' alternatives.

Francesca Randone | Researcher, AI Lab, Department of Mathematics, Informatics and Geosciences, University of Trieste

6 February 2025

The use of certain artificial intelligence algorithms has a great impact on people's lives: it can affect admission to a school, the length of a sentence for a defendant, the choice of medical treatment or the granting of a mortgage. In all these cases, it is not easy to determine whether the algorithm is right and, as we have seen in a previous article on this paper, the problem lies not only in identifying possible 'wrong' decisions, but also 'biased' decisions, i.e. influenced by biases that the algorithm has incorporated at some stage of its development.

There is another factor that complicates matters: artificial intelligence algorithms are often 'black boxes', to which we present an instance of a problem and from which we receive an answer. What happens inside the box, how the algorithm 'manufactures' the answer we receive, is obscured from our eyes: either because of how the algorithm is constructed, or because the algorithm is owned by a company that does not release the code. As one can easily imagine, this further complicates the problem. If a financial advisor denies us a loan, we can always ask him what the reasons for the denial are, and he will have to justify the decision. But how do you get an explanation from a black box?

XAI (eXplainable Artificial Intelligence) has tried in recent years to answer this question by developing techniques that 'explain' the decisions of black box algorithms. But despite the numerous explanation models produced, there are still some critical issues. Let us try to explain why, at least for those problems that can have a significant impact on people's lives and society, it would be better to do without black boxes.

Explanation techniques for algorithms

Let us try to imagine algorithms as mechanisms governed by thousands of parameters, which we can visualise as 'knobs': for each one there are at least hundreds of thousands, but for the largest ones there are easily up to a billion. The knobs must indicate certain values for the algorithm to function correctly. We do not know these values, but during the 'training' phase of the algorithm, the knobs move on their own a little at a time. In fact, we do not know if the values the knobs arrive at at the end of training are the best possible, nor do we know if the algorithm will always work well for these values. We can only trust that if the training was done well, the knobs went to the right place.

At the same time, we also do not really know the role of each knob. In general, we know that without training, by choosing values at random, the algorithm does not work. But we have no idea whether after training, by just changing the values of a few knobs, the algorithm will continue to work in the same way, improve or worsen. In fact, we can't even tell if all the knobs we are using really work. In short, it is as if we were looking at a very complicated jigsaw puzzle, the pieces of which have fallen into place on their own, and which even if we wanted to, we would not be able to put together from scratch.

For this reason, scientists have come up with various techniques to 'explain' what an algorithm does, without trying to understand the role of each individual knob. Some methods are based on trying to representing these very complicated algorithms with simpler onesthat they behave in the same way, at least on some problems. It is like discovering that the great initial algorithm on some problems behaves like an algorithm with only ten knobs - which we know exactly how it works - and using the latter to explain how the former works.

Another method of explanation consists inhighlight which parts of a problem are the most important for the decision. For example, for an algorithm that has the task of labelling images, it is possible to have a graphic representation of which parts of the image are most important in determining the label. In this graphic representation, called a saliency map, the most important pixels are coloured a darker colour.

Finally, one can try to explain the reasoning of an algorithm in a 'counterfactual' manner. For example, an algorithm according to which a bank should not grant a loan might give as counterfactual explanation the sentence: 'if the applicant had a permanent job, the loan would be granted'.

Explanations, but limited

Altough there are methods to explain the behavior of black boxes, these do not always provide the explanations we want. Let's remember the case of COMPAS, the algorithm that was used in American courts to estimate the probability that a defendant could commit a crime again, and which was used as a decision support to decide the size of the sentence. It was shown that although COMPAS did not have access to any direct information on the ethnicity of the accused, this was inferred from other factors, such as surname and area of residence. In this case, even if we had tried to explain COMPAS's decision via a simpler algorithm, this probably would have told us that the decision depended on factors such as name and origin, but it would not have directly explained the racial bias underlying the decision.

In other cases, explanations may be too succinct to be really useful. In an interesting article The image of a husky dog, correctly labelled by an algorithm, is shown on the limits of the explainable models. Also the saliency map showing which areas of the image were the most important in determining the label seems reasonable, as it highlights the dog's snout. However, when the saliency map which would lead to the label 'transverse flute' the two seem indistinguishable! This raises the question: a saliency map would it really be able to make us suspicious in the event of a wrong decision?

Counterfactual explanations are also not always necessarily useful. Let us take the example of the loan and imagine that the explanation is correct (i.e., the applicant could indeed obtain a loan if he had a permanent job), but that the applicant could also obtain the loan simply by filling in an additional form. In that case we would like as a counterfactual explanation the one that shows the easiest solution for the applicant, i.e.: "if the applicant had filled in the additional form he would have obtained the loan". Unfortunately, in many cases we cannot guarantee this 'minimality'. Indeed, in some cases it is not even easy to define what counterfactual explanations we would like.

These are not the only problems with explanation methods. In black box algorithms we are often not sure what information is actually being taken into account, but for an explanation method it is very difficult to realise that information has been left out. Instead, it is easier to see what information is being used. Furthermore, a decision may result from errors or biases in the data, and this too is difficult for explanation methods to detect.

Finally, some studies have raised doubts about the effectiveness of explanation methods by showing that, by altering even imperceptibly the patterns we want to explain, explanations can be altered at will. To give an example, an algorithm that assigns loans solely on the basis of gender could be altered so that it continues to give the same answers, but many explanation methods would not notice this, and would in any case give absolutely plausible explanations that would in no way make one suspect the existing bias.

W transparency

From the discussion so far, it appears that we are far from being able to satisfactorily explain the decisions of the black box algorithms. But then why do we continue to use them? Are there alternatives? Opposite to black box models, there are so-called 'transparent' models. These are interpretable algorithms, i.e. whose inner workings we understand and whose decisions and behaviour we can explain. However, there are several reasons why this class of algorithms suffers from the competition of the black boxes.

In general, inventing transparent algorithms that work as well as black box algorithms is not easy. The hundreds of thousands of parameters (or knobs) that make black box algorithms black boxes are also their strength: they make them extremely flexible and able to perform well on a wide variety of problems, ranging from solving differential equations to recognising subjects in images. Moreover, black box algorithms have proven to be much better than transparent ones at revealing hidden patterns in the data, and thus very useful in all those problems of which we do not have an exact understanding or which are too difficult to describe in detail (as an example we can think again of the problem of image recognition).

Transparent algorithms, on the other hand, must try to do the same job by exploiting what we already know about the problem and looking for sophisticated solutions that do not use 'too many parameters'. They are often based on problems that we would know exactly how to solve, but that would take too long. In general, developing transparent algorithms that are competitive with black box algorithms requires much more effort on the part of researchers. And it is not always worth it.

When it is better to use what we understand

Indeed, for many applications in everyday life, black box algorithms are a formidable tool and have marked a huge step forward. Until the 1990s, developing an algorithm that could recognise images accurately was almost science fiction. Today, neural networks, a popular family of black box algorithms, make it a task within the reach of an undergraduate computer science student. But as always, it is good to be aware: as the European Artificial Intelligence Regulation has reminded us, we are not prepared to accept the same level of risk for any application. For some decisions, it is crucial to have guarantees on how these are made, and it is precisely in these situations that we should avoid black box models, even taking into account that there are almost always equally valid transparent alternatives.

A worrying trend to be noted in the world of research today is that of developing black box algorithms that are increasingly powerful and versatile, but no longer interpretable. It is important not to forget that there are applications where, regardless of the power of our algorithm, we cannot guarantee an error probability low enough to be acceptable. For these applications, for instance those that support medical or judicial decisions, it is only right to turn to what we understand, and indeed to push ourselves with research to understand even more.

You might also be interested in

SocietyTechnology and Innovation

Artificial intelligence, human errors

When AI gets it wrong: why it happens, and how to avoid it.

SocietyTechnology and Innovation

Where artificial intelligence will take us

The AI Index Report 2023 captures the state of the most talked about technology of the moment.

Mind and Brain

Will artificial intelligence be the 'lie detector' of the future?

AI recognises lies through language analysis better than humans.

Inside the black box of artificial intelligence

Explanation techniques for algorithms

Explanations, but limited

W transparency

When it is better to use what we understand

Tags

Share

You might also be interested in

Artificial intelligence, human errors

Where artificial intelligence will take us

Will artificial intelligence be the 'lie detector' of the future?

Social

Sign up for the newsletter

Legal info