In this blog, we’d like to give you an overview of what, in our opinion, were the most notable speakers and research topics of the AIES conference. During three days, we tried to absorb as much information as possible by attending virtual keynotes and poster sessions, whilst chatting along with researchers from all over the world. Which was quite a challenge at times given that parts of the program were at night (hosted in the US).
Keynote sessions are arguably considered the centerpiece in most conferences, and AIES 2021 fully delivers from this perspective. It will not be an exaggeration to claim that every keynote presented was exciting and extremely relevant for the community in terms of its biggest debates.
The first keynote was provided by Timnit Gebru and was interesting for a multitude of reasons. She talked about her recent firing as co-lead of the Google Ethics team. Which, albeit a tad scary, provided a fantastic insight into Google's practices for protecting their brand. Complete with a playbook for discrediting disagreeing scientists such as Gebru herself. She projected this issue onto the larger scope of the AI landscape, and with that, the world, to show how society is being shaped by the interests of large corporations.
The practice she criticised in the retracted paper she was fired for, pointed out the selectivity in language that had access to these types of models.The example she noted of Google's use of large language models was however relatively harmless compared to the example she gave later on about Facebook’s role in spreading misinformation in part caused the Tigray genocide. How a country such as Ethiopia, where more than 80 languages are being spoken, suffers through the exclusion of these languages and such is caught in a perpetual loop of marginalisation. Apart from marginalisation, this is an important issue regarding large models deployed by corporations in general. The access to data Google has is behind a paywall for other users. Papers that use Google Streetview can’t often, if not never, release their dataset for others to reproduce. On top of that, papers that work on Computer Vision methods for which labeling is required are using resources that are not available to everyone. And this is why Timnit Gebru's talk resonated so much, as the issues run much deeper than just the actions of large corporations. If we want to start using AI to combat inequality we should start by becoming independent. Both in the literal sense of not using Google data, as in the figurative sense of stepping away from a baseline of supervised learning.
The second interesting keynote was provided by associate Law Professor Ifeoma Ajunwa. During an extensive Q&A session, Dr. Ajunwa talked about how automated decision-making can give (unintended) biased results.
Working on the intersection of law and technology - with a particular focus on ethical governance of workplace technology - Ajunwa knew how to inspire a multidisciplinary audience with various inspirational quotes. Starting with “Move slow and audit things” - as opposed to the famous saying “Move fast and break things”, meaning that auditing your industry can make systems more liable but also addressing the “Veil of ignorance” in automated decision-making and creating “Safe harbors” for AI regulation. When the audience asked how to address “the veil of ignorance”, Dr. Ajunwa was loud and clear; one has to include citizens, especially minorities in the product design of a system. However, including minorities does not mean that researchers should “remain in their armchairs while they think of ways to put themselves in the place of the least advantaged”. Inclusion requires a more active approach by involving people in feedback sessions and interviews throughout the lifecycle of a system. Bringing us to the next quote where Dr. Ajunwa introduced the idea of creating “safe harbors” for AI & regulation. These harbors can function as “time-spaces” where companies, or other stakeholders, are allowed to process feedback and correct biases if needed. Dr. Ajunwa warned that the incentive to hide problems is higher when the penalties are too direct. Creating safe harbors can thus open a safe zone where stakeholders will be more willing to rectify.
Another extremely relevant debate surrounds the usage of benchmark datasets for research and development. The 2020 paper Data and its (dis)contents summarizes a cultural problem in the machine learning community in how they collect and use datasets. One part of this critique is centered around the tunnel-vision focus on benchmark datasets and task leaderboards. Arvind's keynote, titled The Ethics of Datasets: Moving Forward Requires Stepping Back can be seen as the reasonable "middle ground" stance in this debate. The talk could also have been titled In Defense of Benchmarking Practices, without losing any of its validity.
Arvind begins by detailing the six major roles benchmark datasets play in the machine learning field, and how each role differently causes benefits or harms. He claims that benchmarks are meaningless in terms of absolute performance, but can still be a good indicator as to relative model performance for a specific task. With that in mind, he proposes for the community to fix the issues with benchmarking practices, but to be careful not to "throw the baby out with the bathwater". He also emphasizes the importance of distinguishing benchmark datasets in the context of scientific research as opposed to production scenarios. While benchmark datasets are important for driving scientific research, he argues that they are not suited to be deployed out of the box for production contexts. The risk of committing harm by algorithms used by companies or governments is much higher than datasets used in scientific research. He recommends governance for datasets used for these real-world contexts, an issue he promises will be addressed by his next paper.
The format of scientific conferences altered drastically in the past year in consequence of the COVID-19 pandemic. A positive effect has been that attending these events has become more accessible, which benefits diversity and inclusivity within the AI research community. A downside has been the lessening of opportunities to meet and interact in a colloquial and informal manner. The poster sessions hosted by the AIES conference were a welcome exception to this. The conference hosted a virtual room on the online platform for each of the poster presentations. Apart from the benefit of being able to see the content (poster and presentation) at any time during the conference, this setup allowed for more valuable interactions with the authoring researchers. Without the usual time constraints, questions could be asked at any time and discussed at length over video calls. This allowed for a more open discussion, which was conducive to a better understanding of the work presented. Moreover, it mitigated the anonymity of the audience, thus providing a more rewarding experience for presenting researchers.
Paper Sessions Diagnosing Bias (Session #1)
During the ‘Diagnosing Bias’ paper session, intersectional biases, algorithmic fairness, gender bias, and under-representation were discussed. One of the questions that arose was whether the current AI tools we use for deciding societal issues are robust enough to be used in high stakes scenarios such as deciding if a person would be a good immigrant or predicting the recidivism rate. Should we be relying on AI tools to help make these decisions? Do we fully understand the biases embedded in the data and that are potentially being propagated and/or amplified? By and large, we do not have the answers to these questions. However, there are solutions proposed to counter some of these issues. For example, training algorithms to satisfy fairness criteria. Nevertheless, we cannot ignore social and structural problems such as some groups being more heavily policed than others or facial recognition tools - that have varying levels of accuracy per gender and skin color - being more used in certain neighborhoods. One aspect that is clear is that the people affected by the decisions made by AI-supported tools should not only be more involved in the design but also more represented in the data.
Formal Implementation of Fairness Criteria (Session #8)
During the ‘Formal Implementation of Fairness Criteria’ paper session, the authors surfaced potential issues with some of the most widely used fair AI practices and proposed new methods to address them.
Human-in-the-loop frameworks promise to mitigate automated bias by consulting the opinion of a human expert in situations of high uncertainty. However, as Keswani et al. note, experts are not immune to their own biases. Therefore, they propose a framework that aims to maximise fairness by training a system that accommodates deferral to multiple experts, accounting for additional costs and availability.
Outlier detection algorithms are meant to detect statistical anomalies, in order to identify criminal activity, fraudulent behaviour, etc. Oftentimes, though, societal minorities appear as statistical minorities. Shekhar et al. show that as the sample size of a group decreases, its overall outlier score tends to increase. To address this issue, they propose FairOD, an outlier detection algorithm that decides independently of protected group membership, while at the same time being close to the ground truth, by flagging truly high-risk samples in each group.
Finally, methods that equalise error rates have the caveat that they can artificially inflate error rates of easier-to-predict groups. This is not desirable in social welfare applications where all groups are disadvantaged (e.g. domestic harm prediction). With the proposed Minimax Group Fairness framework, Diana et al. demonstrate that fairness can be achieved by minimising the largest group error rate instead, without sacrificing accuracy.