top of page
佳璇 刘

Dr. Michele Loi: The Design Logic Behind Ethical AI in Public Administration

In May 2024, the Saint Pierre International Security Center (SPCIS) initiated a series of interviews titled “Global Tech Policy at the Forefront: Conversations with Leading Experts.” This initiative aims to deeply understand the influence of emerging technologies such as Artificial Intelligence (AI), blockchain, biometrics, and robotics on global governance and transnational public policy. This series features leading experts in the field from academia and industry. The interviews will be published on SPCIS’s social media platforms, including their WeChat public account, website, and LinkedIn.

 

On July 24th, Dr. Naikang Feng interviewed Dr. Michele Loi, the Principal Investigator for the Philosophy of the project "Socially Acceptable AI and Fairness Trade-offs in Predictive Analytics" and Senior Scientific Manager at Algorithmwatch.org. The interview highlighted Dr. Loi's insights into the ethical and normative requirements of AI and the possible designs to safeguard AI usage in the face of capacity gaps within public administrations responsible for implementing AI technologies. This article summarizes the key messages from the interview, and the content has been reviewed and authorized for publication by Dr. Michele Loi.

 

Dr. Feng Naikang: First of all, thank you so much for accepting our invitation for this interview. I know you're very busy. I came across your article on accountability from the public administration perspective and it is particularly interesting to me. To start, could you please give us a brief introduction about yourself, your background, and your research related to emerging technologies?

 

Dr. Michele Loi: Yes, of course. I used to be a researcher at the University of Zurich. Currently, I work for a non-governmental organization called Algorithmic Watch. I am finishing coordinating a project on fairness in machine learning.

 

The research on AI use in public administration that you mentioned was initially carried out in collaboration between the University of Zurich's Digital Society Initiative, where I worked, and Algorithmic Watch, a significant European NGO. This organization has long been advocating for increased transparency and accountability as AI is introduced into the public sector, especially in Germany at the Bundestag level, pushing for legislative proposals to create a registry of different AI applications used in the public sector.

 

In my research with them, we co-developed guidelines and a methodology for assessing the use of these applications. This collaboration included a professor of law from the University of Saint Gallen and the canton of Zurich. Additionally, we started a new project a few months ago with public authorities in Switzerland . In this project, we tested the approach found in the 2021 Algorithmic Watch report. The impact assessment was turned into an online tool and an even more comprehensive methodology.

 

Dr. Feng Naikang: Can you tell us more about your collaboration experience to make policy impact on AI development in the public sector? Also, to what extent has AI been incorporated into government operations in European countries?

 

Dr. Michele Loi: Sure. Both projects are in the context of Switzerland. To conclude my earlier point, the next step of the project is to return to the German context within the EU, which now has the AI Act. This means the initial proposal for a registry and related legislative proposals need to be rethought because regional governments must now establish procedures set by the AI Act, requiring a fundamental rights impact assessment. This is different from our original approach. Additionally, I published an article on transparency and accountability related to this work.

 

Now, regarding our experience within Switzerland: In the first project, there was no specific AI application. Our primary interlocutors were government officials responsible for designing and procuring services from external companies. They wanted an ex-ante guide on how to proceed. In the second project, they already had a project set up with a company using a version of generative artificial intelligence customized by a private contractor. We tested, revised, and applied our methodology to this use case.

 

The AI design pipeline's role was minimal. The framework for the application was already established with the company, focusing on defining experimentation and potential future use. Our main partners were government officials responsible for the project specifications. This was suitable because generative AI projects are not highly sophisticated. The generative language model already exists, requiring only tweaks to user interaction and workflow within the organization. Therefore, discussions were primarily with government officials who could redefine specifications, which the computer scientists then adapted. The AI engine, a large language model, remained largely unchanged, with only minor interaction adjustments.

 

Dr. Feng Naikang: The context is always so complex, and the way people interact with these models can be diverse. How can we set up a comprehensive framework or best practice to guide every different kind of AI practice or application?

 

Dr. Michele Loi: Yes. So again, let me explain what we did and how this generative model emerged. Our 2021 approach, which I invite you to read on the Algorithmic Watch website—just look for the Algorithmic Watch impact assessment—was designed at a time when AI in practice meant machine learning for predictive models or some sort of prediction-based response by a computer program. Generative AI was not in sight. Even if I didn't know the specific application, I wanted to develop the framework in a general way.

 

Dr. Michele Loi: But because there was generative AI, I had to deal with it. How did I do it? I added three questions. Let me try to explain to you how we thought the logic of this would work.

So basically, there is a general framework. There are general aspects of accountability, transparency, attention to bias, and explanations that we didn't change. We improved some questions and clarified them, but essentially they are the same questions you find in the 2021 edition. That means we structured it this way. The logic is this: We try to achieve what I called in one of my publications design publicity, design transparency, which means being extremely clear about the purposes and the reasons why you are using an artificial intelligence or more broadly, an algorithmic system.

  1. Design Publicity and Translation Transparency:

    • Explaining how you translate those goals in operational terms, as well as all the normative constraints that you want the system to satisfy, both from legal and ethical or pre-existing requirements, as well as requirements you think are important. This is called value by design. You try to identify ahead of time the normative requirements that the system should satisfy, so it can achieve its goal within the constraints of those requirements. We call this translation transparency.

    • We have questions about the goal and about what these constraints are.

  2. Operationalization and Accountability:

    • Then there is a second phase where we ask how you operationalize these. So how do you operationalize your goal for the AI and how you operationalize these normative constraints in practice.

    • Then we have a section about accountability. Who's in charge of setting the goals, operationalizing them, and controlling that they are achieved?

  3. Performance Transparency:

    • And then we have a section on performance transparency. What are the mechanisms that you use to determine that the goal is achieved and is achieved better than before, or at a lower cost in some way? How do you verify that your normative constraints are satisfied?

 

So these are still the logical structures. These questions you can always ask, whether it’s generative AI or not. But we also introduced some specific questions about generative AI because our approach has a logic that is quite special. Most approaches I've seen always ask all the questions, almost like an auditing process. You must go through all questions, which makes it very heavy. Our approach has a first checklist, and depending on how you answer the first checklist, you will have to fill different parts of a report.

 

Yes, you can say that the first part is a bit like a risk collector. Depending on which risk signals you activate by answering the triage, you then have to do a different job. That's why we try to assess the kind of risks the system tends to pose, whether there are significant risks due to bias or the potential for discrimination, risks due to automation, explainability, automation bias, loss of control by the government of key assets—all of these. Answering these questions is meant to activate specific duties of reporting. So these are not always the same.

 

For generative AI, we were worried, and the experience proved this, that people were not sufficiently careful about it. They tend to trivialize it. The fear was that they would not respond in a way that's very perceptive of those potential risks. So my idea was to introduce three further questions.

 

  1. Identifying Critical Use Cases:

    • The first question identified 5 or 6 cases (I don't remember exactly) of uses of generative intelligence that will trigger the need for these reporting duties I talked about before. The logic was not to cover all possible uses but to identify some uses that should trigger more worries. Is this ever comprehensive? No, this is totally fallible. So maybe there is some use we didn’t think about. The tool is a heuristic; it will never be complete. But if you think carefully about the more general questions that are not about generative AI, you should be able to address it.

    • For example, looking at question 20 of the triage (let me put this in the chat), it asks: Do you use generative AI for the following purposes? Any of these triggers some reporting duties, transparency duties basically. If you answer yes to these questions, it triggers the need to do a report, which includes general questions for every AI that we consider poses a risk. Then there are specific questions triggered by these two questions here.

  2. Model Modification:

    • Have you modified or tweaked the model? If you answer yes, then it activates these reporting duties.

  3. In-house Development:

    • Have you built it in-house? In that case, you have even more duties of reporting.

 

Dr. Michele Loi: The second and third questions were about modifying or tweaking the model and in-house development, respectively. These are meant to make the process adaptable. But it's also very important that our theory of action relies on transparency for these tools. So the idea is that you need to make public, maybe not to the entire population, but to someone else. There needs to be some public record of the decisions that were taken. And you can, if you download the report, see what kind of requirements we have. Even if they are the old ones, you can get an idea of the intellectual style of this.

 

Dr. Feng Naikang: Great design! Scholars agree on the importance of transparency, explainability, responsibility, and accountability in guidelines. However, the focus and approach vary by context, individuals, practitioners, officials, and scholars. Where do you see the trade-offs in approaches and greatest gaps?

 

 

Dr. Michele Loi: At the level of algorithmic auditing and impact assessment, there are many differences in approaches. Everyone faces a trade-off between a lean, easy-to-understand tool that provides limited guidance and a more comprehensive, directive tool that supplements public administration with know-how, knowledge, and awareness. Checklists are meant for that. In developing this, I always preferred the lean approach. However, in practice, we receive more questions for clarification. This shows a fundamental gap in capacity. Public administrations, even in wealthy countries like Switzerland, lack the know-how, partly because hiring skilled programmers is very expensive.

 

Dr. Michele Loi: I mean, basically what our approach requires, is a complex set of interdisciplinary competencies and know-how. Because we got questions about privacy, we got questions about bias, and the idea is that you support... In theory, this could only work if there is one person in the organization who then goes to the lawyers, to the privacy experts, to the computer scientists, who know how to frame these questions. However, these interdisciplinary skills are rare. This process was not in place when we tested this. There was one person who answered all the questions. This person didn't have all the necessary competencies.

 

So our ideal approach in this moment seems not realistic. Our ideal approach is we ask the right questions, they find out how to answer them by asking specific experts as needed. Checklists serve this purpose. We tried to make our approach efficient by designing dynamic checklists that minimize the number of questions, triggering only necessary follow-ups. However, this relies on correct initial responses.

 

Ultimately, developing this expertise within public administration is still crucial. Governments seeking to reduce human workforce costs with AI must still hire fewer but more skilled and expensive experts to ensure responsible and trustworthy oversight. Effective innovation complements human skills rather than replacing them, encouraging higher education and well-paid jobs requiring sophisticated abilities. This capacity-building is essential for progress.

23 views

Comments


bottom of page