In 'Meine Akten' einfügen
Nikete (Nicolás Della Penna)
Research Affiliate, Laboratory for Computational Physiology, Massachusetts Institute of Technology (MIT)
Interested in algorithms and mechanisms for human collective cognition: theory and applications (particularly around medicine, markets and bubbles).
Florian Glatz and AI researcher Nikete (Nicolás Della Penna) talk about new developments in AI, training data and computers judging legal cases.
Nikete Nicolás Della Penna
FG – Florian Glatz
NI – Nikete (Nicolás Della Penna)
FG: Nikete, you are an AI researcher and you have been researching at MIT for a number of years.
NI: I have worked with an AI research group at MIT for a number of years and I have been an affiliated with them since last year. The unifying theme of my Ph.D. thesis is this study of how to make better decisions, be it from past decisions or be it from the input of many experts. Especially when you are not forcing decision-makers to take a specific position, but when you are advising them, and they are ultimately the ones that decide. So, for example, the doctor in the medical system decides what to give to each patient. When the doctor then checks what the patients actually did, you can call this compliance awareness. So, the agents might or might not comply, and the learner might or might not be aware of the compliance.
FG: Your main field of research was the medical field …
NI: Yes, the motivation is the medical field. You could also think of other situations where you advise your clients and your client might or might not follow your advice.
FG: Today, lawyers perceive the status quo of artificial intelligence to be what you guys have been doing ten or twenty years ago, if not more. Whereas in areas of research such as the medical field where there is a lot of money available for research you are already doing much more complex things.
NI: I think part of the reason for this is the size of experiments. Different industries have different propensities for choosing an approach. An easy way to characterise this is industries such as the medical field where experimentation is relatively cheap given the relation to the pay-offs and where outcomes instead of reputation matters. These are circumstances where you expect to see more experimentation.
FG: How do you see the legal industry to be different?
NI: The legal industry is diverse. In the legal industry we would expect the highest rates of innovation in places with the least reputation. So when I go looking for clients in the legal industry – I have not had any –, I am looking for places where reputations do not matter, where pay-offs are large, where experiments are cheap. Class actions are one place where these things could come together. I am expecting that American biomedical damages will be the place where bleeding edge machine learning will be used. To find victims etc.
Germany seems to be much more reputation-driven. It is much harder to experiment. Things that have big potential pay-offs seem rare.
FG: Today, AI ist mostly applied to problems in information retrieval. Startups such as Leverton use AI to scan through thousands of documents such as rent agreements to discover key information.
NI: This is also admittedly by far the most industrialised aspect with the least investment to productize. The open source stack for doing information retrieval is insanely developed. You can start a start-up by very small means. You will just see what you will find.
FG: Exactly. So what are the kinds of innovation that you are already dealing with in the pharmaceutical biomedical industry that you think as suitable to be applied to the legal field?
NI: Transfer learning is a big topic. In Germany, people always say that AI will not work in law due to a lack of training data. Court decisions are generally not publicly available. What transfer learning lets you do, is: you have a source domain that is data rich, that is the data where you are learning from. The US laws which are REL 03/2019 S. 9largely public and maybe laws of other European countries that are similar to the German system but have more public data available online. And there is the target domain where you do not have as much data, e.g. Germany.
Another way of thinking of this is that you have a data source that is abundant, but biased. And you have a data source that is scarce, but unbiased. Another way of saying: You have a lot of data, a lot of variable estimates, but the data is from something else, so it is biased. And you have small amounts of data on the right thing, i.e. very low or no bias. But you have very high variants because you cannot estimate the underlying function well. Transfer learning has been very successful in such applications, especially with regards to languages.
FG: Can you describe an example?
NI: For example, there have been large models for generating text that were richly trained on a corpus of long form internet texts that came from reddit, like 40 gigabytes of text. So this model produced stuff that looked quite a lot like English, and stylistically, it can reproduce the kind of English that people write on the internet extremely well. One experiment that worked extremely well, that is very encouraging is, when you grab that trained language model, you can apply it to another domain, for example, poetry. You can train the language model to try to match the output of a very small but unbiased language model on poems. It turns out that this works very well. You can do this for many domains, not just poetry. You are using ten thousand times smaller data sets, and you are getting results that look very convincing. Because the big language model has already learnt the structure of a language, it can learn quite quickly these higher-level things about meter and rhyme in the case of poetry without having to retrain the sort of lower level characteristics.
FG: Do you think this sort of transfer-learning could be applied to the logic of law?
NI: Laws are just texts…
FG: It is just text, but is has its own logic which lawyers are very proud to have studied for decades.
NI: To the extent that there is more structure, and that structure is more logical, it makes the problem easier. A neural network is a very logical thing. The more pattern it finds, the easier the learning becomes.
FG: The topic of lack versus abundance of training data is a big topic in the legal field in Germany. We have talked about the idea of transfer learning. But I remember you telling me about another interesting strategy to compensate. All the court decisions ever published in Germany are somewhere in the archives of law firms. Yet there is no combined archive. You have been mentioning that there is a way to have an AI learn on all these data sources without revealing them.
NI: There are actually two ways. One of them is more practical than the other. The two ways are homomorphic encryption and multi-party computation. So in homomorphic encryption, everyone encrypts their data and the algorithms work on the encrypted data to give an output, but then you can use the keys you used for encrypting, to unencrypt the output. Despite the fact that you did the computation on the encrypted data, the output unencrypts to the answer. There is a homomorphism between the encrypted text and the plain text. This is sadly rather impractical on any scale you care about.
FG: What about the multi-party computation as an alternative approach?
NI: In a multi-party computation scheme we all carry out the computation together to train an AI.
FG: Can you make the example based on, say, a thousand law firms in Germany that want to train an AI together?
NI: Yes. One possible strategy would be that all the law firms would connect to some server. As long as one of the servers is honest none of the servers involved will be able to learn anything about the law firms’ inputs except what is implied by the output. So, then basically each law firm is talking to the servers that are carrying out the computation back and forth, and the servers asked to carry out that little bit of computation add in some noise and give it back, and then they do this over and over.
The important part is that it does not reveal anything about the input beyond what is in the output. So if the output is like a classifier to predict decisions, you will learn the ways of the classifier but nothing else. You can try to have guarantees that your output is not dealing with anything beyond the input.
Of course, this takes two things. The multi-party procedure, that you are not revealing the inputs as you put them in, and also that the output has these particular properties that do not reveal too much about the input, which is also not trivial to obtain.
FG: Would it be possible to apply this multi-party scheme today to thousands of law firms to create a shared AI model?
NI: Let us do our reality check. Today, the size of computations we do with these kinds of schemes are much smaller. So the shared models we create are much less complicated than the ones that you use for predicting e.g. court decisions. But the engineering improves, and there are applications that are very meaningful, in particular around medical data, that will have legal ramifications very early on, because people sign waivers, and the waivers were written in a world that did not have this technology. There will be people wanting to know can you use the data that they gave only to me, without revealing it. What does it mean, to “not reveal”? How can you make mathematical notions what “revealing” legally means?
I guess the large German pharmaceutical companies such as Bayer will surely also have legal positions on this. Much earlier than the law firms who could train classifiers jointly.
FG: How would this technology be applied in the medical field?
NI: You have banks with genetic or phenotypic medical information.
FG: When you use that data as training data that could be a violation of a person’s right to privacy under GDPR.
NI: Yes. You want to train with everyone´s medical data and everyone´s genes at the same time. But there are legal reasons why it is incredibly difficult to make medical data cross borders. So if you have rare medical conditions and a country only has few people, you will have to learn from many people across countries. We already have huge legal consequences. We also have this international flavour, almost inherently complicated international questions. The whole point is to avoid these legal constraints. In a way, that preserves the spirit of the law. Also, what are mathematical notions of privacy that will, in the end, be compatible with the underlying law. Even just doing it inside of Europe is often very difficult.
FG: In one of our last issues of REthinking: Law, we discussed the legal and philosophical ramifications of computers judging legal cases. From your perspective, is AI anywhere near being able to judge legal cases?
NI: A comparable task that AI gets increasingly better at is question answering. Today, the state of the art are high school-level questions. But the questions a judge has to answer may not always be so much harder than high school questions. A very bright high school student might have some ability to predict what is going to happen, given the legal background documents. Maybe not a hundred percent, but not zero. This is the very bleeding edge. The first time that someone beat the human standard in one of the question answering systems was a couple of weeks ago. There are not many groups that can do it, but the progress is very stable. In that sense, it is mostly engineering, not science.
Very generally, how the system works: If you look for questions, and you look at the right answer, there is a way you can transform the data of the question into the pattern that you would expect around the answer in a corpus of text. So that is basically what the system is trying to do.
What do we mean by a pattern of text? So it may not have anything to do with the right representation of words, it may have to do with the code. This is another place where transfer learning can work very well.
So you may train a system in a language where you have very abundant data, a rich representation of the language around, and it may be quite straightforward to pass it to a similar language.
FG: That sounds fascinating, yet it seems like such an AI would not have developed an actual understanding of laws and their meaning. The AI will just guess patterns.
NI: It is very difficult to operationalise meaning or understanding for texts. But for example, in images, it is easier to operationalise what we mean by meaning or understanding. So, for example, if I show you a picture of this cup, and I rotate it like this, we would understand this is a rotation. It is really understanding. And we already have networks that have representations, that are rich enough to understand like a change of point of view or rotations of the object or the colour of the cup. So it is not clear to me that that work does not have meaning or understanding of what is a cup. I mean, you are able to manipulate it.
It is not easy to operationalise meaning and understanding outside of manipulation. In fact, some people that study causal structures – this is contentious – would argue that there is no sense in which you can talk about causation without manipulation.
FG: Even if you have an AI that effectively passes the Turing test, it would still be a machine that does not understand what it does. A machine without meaning or understanding.
NI: I can make a pretty strong argument that this is a misunderstanding of how cognition works. You have this perception – most people have it – that you are the centre of experience and that your experience is reality. Now, it is easy to realise that you are not the centre of experience, although it is difficult to bring it totally into yourself, but at least you understand the notion. I would argue that it needs to be understood that you do not perceive reality. You perceive your senses of input. So your perceptions are not reality, but your senses of input. If you believe in physical reality, as far as we can tell from biology and physics, it does not leave a lot of room for alternatives, you see that you have a machine that is able to predict its sensor inputs, and that has some control over its environment, trying to stabilise its predictions based on sensors of input
People have historically really wanted to make your kind of arguments against AI. You know, I wrote a thesis about preserving freedom. It is not like I have no sympathy for the historic argument. But it is very hard to find where it is.