How to account for automated decision systems in the public domain?


by Mirthe Dankloff


Automated decision systems are increasingly being used in various public domains. Well-known examples include algorithms used for police screening, fraud detection, and granting subsidies. Incorrect predictions can be highly detrimental for citizens, especially when they are more frequent or systematic for certain social groups. Public authorities, therefore, need to balance the risks and benefits to protect their citizens. One way to do this is by making error analysis transparent and understandable, as it is an important determinant for the design and deployment choices regarding the algorithm.


What are the main challenges for using algorithms in the public domain?


An automated decision system never stands on its own. It is always part of a bigger social-technical ecosystem where various stakeholders are involved in different phases of the algorithmic lifecycle. Depending on their role, some stakeholders will be more involved in the design, testing, implementation, deployment, or maintenance phases of a system. For instance, engineering teams may focus more on the internal mechanisms of the algorithm in the design and testing phase, while a team of communication officers may be involved in disclosing information about the system to end-users within the deployment phase.


In the public domain, governmental bodies regularly outsource the design, implementation, and test phases to engineers in private companies. Such public-private collaborations can make the identification of roles and responsibilities more complex. The complexity is referred to as ‘’the problem of many hands”: when everybody is accountable for a small part of the lifecycle then nobody feels accountable for the whole process. This makes it difficult to pinpoint who was accountable when something went wrong.


This challenge of accountability shows that there is a larger power structure at play. Outsourcing processes to industrial partners ensure that certain important decisions, and thus responsibilities, about the design and safety of a system, are being shifted towards teams of engineers within companies whose work is harder to publicly scrutinize. Determining the important decisions for a system, such as what is an acceptable error rate, or acceptable error differences between subpopulations, can have serious ethical consequences for citizens. Where these decisions must not rely only on engineering teams, but also on (public) domain experts and even citizens.


Providing insights on error analysis and design choices


Being a potential ‘fraudster’ is not a property that people intrinsically lack or possess but a social construct designed by stakeholders who determined what ‘fraud suspicion’ entails. Setting the score threshold for identifying fraudsters is therefore also an important design choice, and it can lead to different outcomes. In the context of predictive policing, setting the threshold too high might lead to misidentifying a real criminal. But setting the threshold too low could mean that certain populations systematically have a higher chance to be wrongly flagged as potential fraudsters. If a citizen has been flagged as a potential fraudster, they have the right to know why they were deemed suspicious and based on which criteria. Identifying and auditing the relevant error metrics to compare is therefore very important when making design choices.


Depending on the domain and task at hand, the outcome for comparing error metrics might have different consequences when a true case is incorrectly identified as ‘not a case’. Another example might be an algorithm used for subsidy approval that treats male and female applicants equally based on actual ‘good scores’. But at the same time, it can be more inclined to incorrectly assign good scores to males who actually have bad scores, and therefore should not be qualified. This means that males - regardless of their actual scores - would still have an advantage compared to females. Therefore, design choices based on different types of error analysis and threshold parameters should be identified and audited, so that these can be publicly scrutinized.


Conclusion

While several attempts have been proposed to regulate algorithms via higher-level legal and ethical guidelines, most of them remain too generic and fail to specify who is responsible for what when issues occur. At the Civic AI lab, we aim to enforce citizens by translating these higher-level guidelines to practical methods. These methods include making error analysis understandable for civil servants and other decision-makers in the public domain. Based on error metrics and the task at hand, public authorities should balance the risk and benefits so that equal treatment and non-discrimination can be ensured.