Who I am is a tough question, as what is relevant to define us actually changes with context. But to make a long story short, I’m a medical doctor by education and I’ve practiced at the University hospital in Bologna before. I’m also a researcher interested in formal fundamental questions. So, you could say I would be classified as “interdisciplinary” by some. I’ve been working a lot on mathematics and Information Systems studies, mostly in the context of medical ecosystems.
I started at the University of Bologna, I moved very shortly to Portugal and the UK where I had the chance of spending time with wonderful mentors with whom I am still in touch today, and I was a researcher at the University of Maastricht for several years, helping to set up the computational biology research line of the experimental pathology group, among other things. After that I got summoned to CERN in Geneva, to serve as a senior fellow of the Director General for medical applications. In between, I founded the SCimPulse Foundation, which I still direct. I am also part of several scientific organisations and commissions, including (maybe the most relevant about our conversation to come) the working group of NATO for human control over autonomous systems.
I was connected to the military in the past from the medical side, because of chance encounters, and in reason of the facts the military are extremely active in healthcare and public health. It so happened that Italy was not represented in the mentioned NATO working group, and someone recommended me. W are tasked with reviewing literature, and technology accidents, in order to identify pitfalls and recommend best practices. We discuss the weaknesses of current approaches to regulations, and design recommendations on engineering autonomous systems. It’s mostly theoretical. We don’t have a lab where we can grow our own evil puppets (LOL), and see whether we can train them out of evilness.
Despite the group having access to classified material in principle, most of the autonomous systems related accidents really happen in the public domain, outside of military context, there where any European citizen could be aware of: reports on events in which artificial intelligence systems — which may have been involved in decisions — did not have the expected outcome are published very often and can involve anything from assisted driving systems all the way to decision support systems in healthcare. Civil society on the other side of the Pond has been extremely prolific lately, producing examples of bias in classification of suspects in police investigations, or involving the exclusion of entire population groups from proper follow-up care, or from access to credit.
Asking the right questions
When you train any machine learning system on a data set, you normally think that you are assigning a task to the machine. In truth this is always slightly different in the sense that the task is encoded in such a way that you award the machine “gains” for solving the puzzle through the data that you’re giving it. And the machine is actually only trying to get those gains, it is not making any sense of the task. This means that any loopholes in your design will be exploited very efficiently by the machine! When this happens in a gaming context, the researchers react enthusiastically by claiming the machine “thought outside of the box”, or “learnt to cheat”… but imagine black people receiving worse follow-up care, because their social context is associated with lower rates of follow-up. Would you be satisfied with a recommender “guessing” that being black, or coming from that social context is by itself a predictor of low effectiveness in care practice, thus recommending not to follow-up with your patient?
The data is biased because the system is, the AI simply learnt to give you the most rewarding reply, without care for the amplification of bias (because that was never part of the game for it).
And what do we, humans, learn from biased data? That it should not be used for classification, that we should use it to focus on pitfalls we want to challenge.
So the issue boils down to the following: should we ask ML/AI to guess from the data under which category does a fellow human fall, in striving for efficiency, as to exclude her/him from a certain benefit? Or should we rather try to force the machine to assist us in finding effective (not efficient) ways to include them?
Adopting ML/AI today is a bit like a magic trick… but in the show-biz sense, not in the silver-bullet sense. When you go to a magician’ show (say David Copperfield’s, I like his art quite a lot), he will distract you. And then something else appears in the middle of your supposed attention field that you were not really focusing on. Simply, because you were not paying attention. Similarly, most of the autonomous systems applications attract a lot of hype, under the pretence of making something very fast and efficient, and this distract us but what they won’t do, and won’t let happen.
The problem is not whether you could use the data set. Could you curate the data set to make it unbiased? At face value, you could obfuscate the sensitive variables, and claim the dataset has not bias. But virtual variables can be inferred, even if they are not in a database, just like an exercise of missing values imputation. The big F is known to be able to infer data by crossing information between an external ID and the data of her/his contact on the platform…
Thus, not… the problem does not lie within the data exclusively, it lies with our intended purpose for them and ML/AI: we’re just asking questions that we take at face value as if we were proposing them to a sentient entity that would of course interpret them contextually, and we are forgetting that instead when talking to a machine we should questioning the questions we ask, ourselves, for it. The true bias is that we are using extremely effective cheating systems to multiply what we’re already doing. Instead of asking ourselves how we’re able to improve certain situations. That’s the direction where we really want to go, I believe.
For example, I find someone burgling in an apartment, and I have to judge — let’s imagine I’m a judge — whether this person will re-offend. To me this becomes a very clear problem. Let’s imagine a scenario where we all tend to feel strongly there is a need for intervention: a man who is stalking his ex-wife. Of course, we want to prevent him from reoffending. Right? That’s without a doubt.
Now, as a human, I ask myself: “is he likely to reoffend. How can I prevent it?” …there might be ideas about temporary restrictions, coupled with therapy and education, and establishing a redline for a certain time window when re-offense might be most likely…
But when I ask the same question to a machine, this could be shortcut to simply meaning “how often does re-offense by this category of person get pursuit?” and recommend me not the best strategy to prevent re-offense and support all the parties move over, it will simply remind me we have to be harder with marginalised individuals with complicated mental-health conditions, and we can be lenient with those individuals who are well off enough to obtain a good legal assistance…
This is not necessarily a tech-only related question, but about the working of the justice system — since this is what we’re talking about — the question could be whether are we using the right tools to ensure that people do not re-offend?
The crucial point when ML/AI gets in the picture is the scale. If I am the only judge and I’m working on my own locally, the tools I have to judge are limited, and bias creeps in my practice as I attempt to exploit heuristics to deal with the burden of my task. Luckily, the bias in my heuristic is local, can be made sense of and appear in audits, …I can ultimately be trained off of some (and maybe into others… nobody is perfect).
But, if I’m deploying the largest system, extremely fast and effective in browsing data, historical information, logistic information, and more: why am I still asking questions as if I were a single person with limited access to information, imagining the system will magically produce the best reply in a context in which I am even avoiding the challenge of aligning my goals with those announced in our Constitutions?
I guess what I am pointing to is that we should work out what questions does AI enable us to ask that we could not ask before? What constructive questions? And what questions need to be avoided or to be reformulated?
Rather than just answering questions we already have, faster and dumber.
Right now, we are not trying this out for real. The AI revolution at the moment is justified by the idea that we are having machines, doing tasks with less effort, at scale… but we are still in a phase in which we “humanise” them, and don’t really work well with them.
This is not going to be easy. We don’t fully understand human reasoning either. If you look at economics, for instance, the work of Professor Ole Peters, who has identified a formal mistake at the roots of the most common formulation of expected utility theory. Most of the research in this field, if I may, say, is low quality because it tried fitting human thought process to a shakily grounded model of wealth dynamics… thus claiming human irrationality in the most stressful context (a bias, I may say?). Now, when he worked out the theory from the corrected formalism, he figured out that what was considered as irrational decision making actually makes sense, formally. If something as simple as deciding on a bet, between two different types of return dynamics (additive vs multiplicative) has seemed so dramatically complicated to academics for centuries, I might be allowed to state that our assumptions about how we decide in more complicated situations, are wrong. We are often asking ourselves different questions from those we think we are formulating. But once we translate these into machine language, the machine is extremely proficient in engineering precisely the question we’re formulating. And this is a short coming on our side, the onus is ours to begin with.
There is research into rethinking the question we ask… more importantly there is research into making explicit how we formulate the question to ask. But you understand that in a moment of hype about ML/AI, this is not a sexy topic, despite being an extremely important piece of work within machine learning and artificial intelligence. And so, most of the discourse around human control over artificial intelligence is focused today on explainability, or reproducibility of the decisions (a historical inheritance of the old days claim that artificial neural networks were too much of a black box). One of the requirements for a new decision to be accepted among us humans is that it has to be convincing in the arguments and assumptions. So, if you have something which predicts the next value of your dice every time you throw it, but you cannot explain how it works, you think it’s a trick, right? Exactly like with David Copperfield. You want to understand where information is leaked for the prediction to be faked, if the prediction itself cannot be explained. This is to say explainability is extremely important - some co-workers and I work on the problem of the explainability and interpretability of autonomous systems decisions, as well… but it’s not enough, it’s just half (or maybe I am too optimistic, and it is less) of the work we need… rethinking how we question the world with this new tools is as important, at least.
Healthcare and AI
Here’s an example from the medical field: predictive medicine. Around the 1970s, a number of tools were introduced in the medical field, relatively simple tools that were meant to assist doctors in predicting the likelihood of disease… scoring systems which would assign the person in front of you to a class which shared a certain risk of experiencing some clinically relevant event in the coming 5-10 years.
With progresses in data crunching, the medical community has been expecting these classification systems to become more refined: they can digest more data, they can be more precise in the finesse the classes. That’s the expectation of precision medicine: not just knowing that a person has more than 20% likelihood in 10 years of having an event, but distinguishing among those who might have likelihoods of 19%, or 30%, or maybe even 50% . But this hasn’t really happened.
This relates a bit to the work of Ole Peters in economics. In the 1970s, with no other tools available, the ingenuity of the doctors that have been tackling the problem of production has been to use methods that were used, also in other industries. You were guessing that if two patients had a certain relevant trait/profile, they would share the same risk. And because they could only handle a few variables at once — such as for example cholesterol, blood pressure, diabetes, smoking, etc. these profiles where relative simple and “inclusive”.
When researchers try to enrich these profiles, what’s happening is that you meet a new barrier: you are confusing the question concerning the destiny of the person in front of you right now, with the question of the destiny of every similar person. The more precise you get, the more you diverge.
Imagine you are entering a lottery, and you know that at the end of the day, the lottery pays off (overall distribution of prizes per chance classes vs costs of the entry ticket). So, it’s a fair game. But, you may try to play the lottery your whole life without winning any single time. And if you go bankrupt, before winning, then there’s no point in knowing it was fair… This happens because it is fair to a set of infinite players, but one player in realistic conditions will never be able to play infinite times…
To continue…
In ecology and evolutionary biology, they have been working on formalizing fitness landscapes, where you interpret the meaning of certain conditions, complex, formal environments. And you can understand how easy or difficult it is to move from one position to another. In economy, there are methods to infer what your likely number of steps are before bankruptcy, starting from a certain point, knowing that the likely the likelihood of gaining or losing at each step is a certain amount. Now, if you could unite this, you could have a very different beast in predictive medicine, instead of trying to collect all the data of everyone and behave in medicine like you do in marketing: if you purchase beer, you are very likely to purchase chips. But there are many more data points in medicine, it’s not the same as buying a beer and being likely to buy chips.
And it’s not only that there are more data points, it’s that it’s a really different question. If you look into the general population: people who smoke, in general, have more probability to have lung cancer. This is like marketing. People who purchase a beer are likely to purchase chips. But if you are asking: is Merkel going to have cancer? Now, that’s NOT a marketing question. Because in marketing you care about the mass, you want to sell more chips, and even if somebody doesn’t buy them, okay, it happens. But with medicine, it makes a huge difference.
If, on average, these markers are more likely to have a disease, do these patients have more risk in the next two years, one year, three years? And being a different question, you will have to work with completely different mathematics. If you’d invest in this kind of research in the next 10 years, with these kind of resources, maybe you will be closer to a much more interesting product. But if everything we do with AI looks just like speeding up what we were doing 20 years ago, we will never get to a point of real improvement.
Even if we make these tools auditable, we would be able to learn that the mistake was done in good faith. But is it making it any better for the people?
If you look at ethics in a pragmatic way, you are not really answering the needs.
Justifying Machine Learning
We may be pushing AI onto the markets, because we are in a rush for “supremacy” on the market. We did it before. For example, after the diffusion of web marketing systems, medicine, received a lot of investments to dematerialize medical data and make them available on the web. The idea was that this would improve metrics of success, whether this was patients involvement in care, or smoothing out pathways across different services. It was done with precisely the same approach that we are seeing today in machine learning: because it was a success for marketing, it would be a success in medicine: the electronic health records that were mostly designed under the current practice of data collection, that was justified by billing.
But 20 to 30 years after, the impact of electronic health records on medical practice isn’t that great: it takes a lot of time out of care. Humans have to put a lot of work to digest the data in a way that the system can deal with them. And this data is taken out of relationships. And if the system is just speeding up the wrong process, the outcome won’t be magical.
What is happening right now is that the AI technology in Europe is being accelerated to be used. But were massively risking getting the wrong results, because we’re not asking the right questions.
Medicine is one of the fields where there’s not really a huge gain by using AI. But a lot of money is being spent both on the development and marketing of it. And this money absorbs other investments. Investing a lot of money trying to train an AI — that money isn’t going into training clinicians, not into getting better tools, not invested in your patients, not even in redesigning hospital spaces.
My argument isn’t that we shouldn’t look into AI, but the true question should be how do we go there? We have never questioned, what’s the moment, and what should be the mechanism, the funnel for AI to enter medicine. At the moment, it’s a vastly unregulated system. As long as a company claims that they have a product, they can start investing and working on it, absorbing public resources.
Well-established practices should be studied. Questions figuring out how much of a certain practice or investigation is informed by current limitations of the system, and if they are informed by the declared agreed upon goal of the system. These are slightly different problems. We should also look at the states or the communities where they have power: they should try to enforce systems of control, or the introductions of these tools, not systems that are extremely punitive. Our goal is not to make it impossible for innovation to enter, but not to make it too easy.
A good example of this, is what’s happening with FDA and the accelerated acceptance of drugs. A recently published review on the drugs that have enjoyed the accelerated acceptance by FDA in the last two decades, uncovered that none of them had any survival gains. So, we created a mechanism where innovative products saving lives enter the market quicker, but in reality, none of them actually did what they were supposed to do.
Stifling innovation?
We’re creating a lot of regulation that contains redundant regulation. It is well intentioned at the European level, but it’s countered by national governments. At the same time, a lot of the conversations and negotions don’t happen at the policy level, but in discussions about standards.
These are all tricks. Both standards, and most of the policies are tricks. For example the GDPR. A few years ago before it actually entered into force, I knew it would most likely be more challenging to be GDPR compliant for small organization than for the large ones. So, despite the declared goal being “we are going to rule how Google and Facebook access your data,” we already knew this was going to hinder small organization’s activities, and actually a treasure trove for the big ones. And what happened if we look at the reviews, two years later, three years later, that’s exactly what happened.
Most of the policy created isn’t really to regulate. And it becomes a hinder, because it was the goal from the beginning.
Another problem is lobbying: we don’t have a lobbying registry. So, we don’t know how much time and gifts our politicians received from lobbying groups.
There is a lot that could be improved about how Europe works. And this is not the nature of policymaking, this is how policymaking is implemented in Europe. That’s a problem. It’s a systematic problem, but not a principal problem. It’s really systematic of the system. The problem lies not in the idea that we want to have policies to regulate issues, the problem is how we do them. And the same happens for standards. Most of the standards are imposed after de facto companies who ruled the market have agreed about it on the phone. I’ve hardly ever seen any standard being imposed before the large companies were already ready for them.
We are allowing it.
AI could definitely be regulated, and I think it should be regulated. But, there is a misconception of the public about the advantages of research. You will need to consider that you cannot scale up at will, without any consideration for the damage that you can cause. You need to be informed about the risks, you need to have a plan B for the risk. And these doesn’t hinder progress. There is no advantage in having research that can scale up, without any consideration for the damage it could cause.
When it comes to biometrics, for example, that we are at a point that we should’ve asked questions before, because maybe it’s too late now. When authorizing the first research on these at scale, we should have asked “what are the implications of generating tools that unify your physical body with your identity.” Because in all our societies, your identities are demonstrated by documents you bring, or by relationships you have. They have to be demanded with data to be verified. And your body is not automatically your identification. At the moment you introduce
biometric tools, whether they are effective or not, you are claiming a unification of the soul. This is the first time it is happening in human societies. We never had companies, humans, or states that were able to impose an identity on a body. And we never reflected on what changes these imply for the social contract of our life to live life together. And this is extremely radical. The point is not whether we want to be faster in telling whether you are yourself when you are entering the train. The point is, what are the implications when these permeates all fields of our life? And maybe now it’s a bit too late to ask because it’s out there.
I mean, there are other ways to try to skew how things are adopted and used. Even China, even while it’s currently extremely powerful, it isn’t alone on this planet as a result of the market and diplomatic relationships
The Future of AI
We have to accept that a number of things that were tolerated before — because they were convenient for everyone, the industry, corrupted officers, and whoever — should now be regulated in a slightly different way. We cannot go through a phenomenon — ignoring it scale, wanting this ruling, access to account 413000 tourists for example — and then trying to have similar rules when a country next door is crumbling.
When you try to regulate taxes, as if we were regulating Western immigration for professionals — all the wrong and strange narratives about migrations and sustainability start to emerge. When honestly, it’s your mistake for trying to regulate one phenomenon with the rules of something completely different. And the same applies to a single person — whatever his role, whether it’s a doctor, or it’s a judge, a policeman. We love to have a system where applications scale up extremely fast, basically, on any device around you tomorrow. Even if I as a doctor,wouldn’t use it, maybe the hospital is using it, maybe the medical registry are using it.
I don’t think there is a short cut. Unfortunately, we have some major powers in this field, but we could frame it. Our best shot, however, is trying to coordinate a movement independently of the singular individual activity of producing the reflections and conversation around the topic. There is nothing bad in trying to have a bit of activism about things that are fundamentally screwing the way your life and your society work.
And AI, like climate change, may call for something of this scale. Because, it’s being treated as “just the next technology,” but it’s completely different from a smartphone. And the smartphone already had a massive impact on society. It’s a Trojan horse, under the disguise of making it easier. And it’s a terrific tool for speeding up everything we don’t like about our society.
It’s frustrating. And it’s similar to my participation in the working group: we have no guarantee NATO will pick up our recommendations. We express critical opinions suggesting to reflect, and to invest in human resources. Okay, we analyze accidents of the past, for example the famous incident where a ship got lost because the GPS was signaling it was in a different location. This is something that would have never happened to the Vikings, they were reading the stars. Until the 19th century, every sailor needed to be able to read with a compass the stars. Now, they were navigating on GPS, everything is fine, nobody even checked.
And the same can happen with the automatic systems. Invest in people to train their skills, that’s the only resource you should really aim at.
At the end of the day, countries can still decide to have a drone with the decision to kill enabled, that works on some biometrics system from China, that will kill the person next door mistakenly during a wedding. As individuals, our power is very limited. But independently of all that, as individuals we are trying to raise is a very neoliberal idea that an individual can change the society.
We should be trained in the mere basics and the foundations, and not use AI as our leader dictating us what to do, but as a nifty tool that could maybe help us.