Queer in AI Workshop @ NAACL 2024

Our main workshop will be held on Sunday, June 16 at Don Julian room. To sign up for workshop and social, click here.

Social

We will be meeting for lunch with LatinX in AI at El Mayor, Tuesday, June 18, 12-2 pm. We might announce more socials soon.

Workshop Schedule:

9 - 9.30: Opening presentations

9.30 - 10.30: Virtual and In-person Presentations (Asia and Europe)

10.30 - 11: Break

11 - 12: Virtual and In-person Presentations (Europe and Americas)

12-12.15: Sponsor break

12.15-12.45: In-person presentations

1 - 2: Lunch

2 - 3 Panel

3 - 4 Keynote

4 - 4.30 Coffee break

4.30 - 5 Drag show

5 - 5.30: Closing

Keynote Speaker: Ophelia Pastrana

Ophelia, a BBC100 woman and openly transgender, is a prominent figure recognized by Grupo Mundo Ejecutivo among Mexico's 40 Leading Executive Women. With global influence in technology, she's spoken at TEDx conferences, nominated for an Eliot Award, listed by Forbes, and engaged in diverse roles from stand-up comedy to digital consultancy, promoting technology adoption and digital culture.

Panel: Queer experiences in industry

  • Alessandra Lambertini (She/Her)

    Full Stack Software Engineer

  • Cipri Callejas (He/They)

    AI Engineer

  • Esteban Reyes (He/Him) aka Esteve Ra

    PhD (c) Computer Science & Entrepeneur

  • Christian Candia (He/Him)

    Data Analyst

  • Juan Tzintzun (He/Him)

    Data Scientist

  • Juan José Baldelomar (He/Him)

    Data Engineer

Being queer in AI nowadays is not out of the ordinary. However, it is known and well studied that our experiences within society are affected by our queerness, and this includes how we navigate career-wise. Sometimes how we live our queerness in professional environments can be leveraged by the work we do or the company we work for. Having a safe workspace is always the best case scenario, but it isn’t the norm.

The main goal of this panel is to discuss our experiences within industry with the mindset of reaching stakeholders or upper management, and together take some actions on the matter. We are aware there exist other instances that can create an unsafe space (say race, privilege, etc), so we will try to include and discuss them, but with a queer perspective. 

Our goal is that, by the sharing of these experiences with a diverse audience at @NAACL2024, we can promote safe workspaces for young queer folks looking for a job or currently working in AI.

Papers

Virtual (time slot)

  • Robust Pronoun Fidelity with English LLMs: Are they Reasoning, Repeating, or Just Biased? (9.30-9.45)

    Vagrant Gautam: Vagrant is a computer science PhD candidate at Saarland University, where they work on measuring and improving the robustness of natural language processing (NLP) systems.

    Robust, faithful and harm-free pronoun use for individuals is an important goal for language models as their use increases, but prior work tends to study only one or two of these characteristics at a time. To measure progress towards the combined goal, we introduce the task of pronoun fidelity: given a context introducing a co-referring entity and pronoun, the task is to reuse the correct pronoun later. We present RUFF, a carefully-designed dataset of over 5 million instances to measure robust pronoun fidelity in English, and we evaluate 37 popular large language models across architectures (encoder-only, decoder-only and encoder-decoder) and scales (11M-70B parameters). When an individual is introduced with a pronoun, models can mostly faithfully reuse this pronoun in the next sentence, but they are significantly worse with she/her/her, singular they and neopronouns. Moreover, models are easily distracted by non-adversarial sentences discussing other people; even one additional sentence with a distractor pronoun causes accuracy to drop on average by 34%. Our results show that pronoun fidelity is neither robust, nor due to reasoning, in a simple, naturalistic setting where humans achieve nearly 100% accuracy. We encourage researchers to bridge the gaps we find and to carefully evaluate reasoning in settings where superficial repetition might inflate perceptions of model performance. Link to the paper: https://arxiv.org/abs/2404.03134

  • Was that Sarcasm?: A Literature Survey on Sarcasm Detection (9.45-10.00)

    Sarthak Arora

    Sarcasm is hard to interpret as human beings. Being able to interpret sarcasm is often termed as a sign of intelligence, given the complex nature of sarcasm. Hence, this is a field of Natural Language Processing which is still complex for computers to decipher. This Literature Survey delves into different aspects of sarcasm detection, to create an understanding of the underlying problems faced during detection, approaches used to solve this problem, and different forms of available datasets for sarcasm detection.

  • Compensatory Biases Under Cognitive Load: Reducing Selection Bias in Large Language Models (11.00-11.15)

    Jonathan Eicher: Biophysicist who met a developer, Rafael Irgolič, on GitHub and fell in love. Shifted post graduation into making AI projects to help support research while learning the fundamentals of the field. Working with his partner in research and founding a company to orchestrate AI pipelines.

    Large Language Models (LLMs) like gpt-3.5-turbo-0613 and claude-instant-1.2 are vital in interpreting and executing semantic tasks. Unfortunately, these models’ inherent biases adversely affect their performance. Particularly affected is object selection from lists; a fundamental operation in digital navigation and decision-making.

    This research critically examines these biases and quantifies the effects on a representative list selection task. To explore these biases, we experiment manipulating temperature, list length, object identity, object type, prompt complexity, and model. We isolated and measured the influence of the biases on selection behavior.

    Our findings show that bias structure is strongly dependent on the model, with object type modulating the magnitude of the effect. With a strong primacy effect, causing the first objects in a list to be disproportionately represented in outputs. The usage of guard rails, a prompt engineering method of ensuring a response structure, increases bias and decreases instruction adherence when to a selection task. The bias is ablated when the guard rail step is separated from the list sampling step, lower-ing the complexity of each individual task. We provide LLM applications and theoretically suggest that LLMs experience a form of cognitive load that is compensated for with bias.

  • QueerBench: Quantifying Discrimination in Language Models Toward Queer Identities (11.15-11.30)

    Mae Sosto (they/them): Queer activist and AI enthusiast passionate about NLP, fairness, and inclusion.

    With the increasing role of NLP in various applications, challenges arise regarding bias and stereotype perpetuation, which often lead to hate speech and harm. Despite existing studies on sexism and misogyny, issues like homophobia and transphobia remain underexplored and often adopt binary perspectives, putting the safety of LGBTQIA+ individuals at high risk in online spaces. In this paper, we assess the potential harm caused by sentence completions generated by English large language models (LLMs) concerning LGBTQIA+ individuals. This is achieved using QueerBench, our new assessment metric which employs a template-based approach and a Masked Language Modeling (MLM) task. The analysis indicates that harmfulness is observed at a moderate level for binary pronouns, slightly less for neo-pronouns and neutral pronouns, while sentences featuring queer terms as subjects exhibit a significantly higher harmfulness level compared to non-queer subjects.

In-person (time slot)

  • Pronoun Logic (11.30-11.45)

    Ashe Neth: Ashe is an AI and ML researcher with an interest in linguistics.

    Particularly in transgender and nonbinary (TGNB) communities, it is an increasingly common practice to publicly share one’s personal pronouns so that we may be gendered correctly in others’ speech. Many of us have nuanced desires for how we are gendered, leading us to use more complex descriptions of our wishes; for example, the descriptor ‘she/they’.

    We observe that these descriptions of our wishes have the structure of a little language all their own. We thus propose formal logic as a tool for expressing one’s personal pronouns and potentially other aspects of gender. We explore three potential logical foundations (linear logic, temporal logic, and free logic with definite descriptions) and their trade-offs.

    Our foremost motivation for this proposal is play, affirming that one can be both a logician and TGNB at the same time. We present formalization as something that can continue to evolve over time with society’s understanding of gender. This implies that outreach is a major potential application: we can show TGNB youth that they belong in logic and have a unique contribution to make. Tools for evaluating whether one’s pronouns are respected are an application as well.

  • MisgenderMender: A Community-Informed Approach to Interventions for Misgendering (12.15-12.30)

    Tamanna Hossain (she/her): Tamanna is a third-year PhD student at UCI.

    This paper contains examples of misgendering and erasure that could be offensive and potentially triggering.
    Misgendering, the act of incorrectly addressing someone's gender, inflicts serious harm and is pervasive in everyday technologies, yet there is a notable lack of research to combat it. We are the first to address this lack of research into interventions for misgendering by conducting a survey of gender-diverse individuals in the US to understand perspectives about automated interventions for text-based misgendering. Based on survey insights on the prevalence of misgendering, desired solutions, and associated concerns, we introduce a misgendering interventions task and evaluation dataset, MisgenderMender. We define the task with two sub-tasks: (i) detecting misgendering, followed by (ii) correcting misgendering where misgendering is present, in domains where editing is appropriate. MisgenderMender comprises 3790 instances of social media content and LLM-generations about non-cisgender public figures, annotated for the presence of misgendering, with additional annotations for correcting misgendering in LLM-generated text. Using this dataset, we set initial benchmarks by evaluating existing NLP systems and highlighting challenges for future models to address. We release the full dataset, code, and demo at https://tamannahossainkay.github.io/misgendermender/

  • The Colorful Future of LLMs: Evaluating and Improving LLMs as Emotional Supporters for Queer Youth (12.30-12.45)

    Shir Lissak

    Queer youth face increased mental health risks, such as depression, anxiety, and suicidal ideation. Hindered by negative stigma, they often avoid seeking help and rely on online resources, which may provide incompatible information. Although access to a supportive environment and reliable information is invaluable, many queer youth worldwide have no access to such support. However, this could soon change due to the rapid adoption of Large Language Models (LLMs) such as ChatGPT. This paper aims to comprehensively explore the potential of LLMs to revolutionize emotional support for queers. To this end, we conduct a qualitative and quantitative analysis of LLM's interactions with queer-related content. To evaluate response quality, we develop a novel ten-question scale that is inspired by psychological standards and expert input. We apply this scale to score several LLMs and human comments to posts where queer youth seek advice and share experiences. We find that LLM responses are supportive and inclusive, outscoring humans. However, they tend to be generic, not empathetic enough, and lack personalization, resulting in nonreliable and potentially harmful advice. We discuss these challenges, demonstrate that a dedicated prompt can improve the performance, and propose a blueprint of an LLM-supporter that actively (but sensitively) seeks user context to provide personalized, empathetic, and reliable responses. Our annotated dataset is available for further research.

  • The Mexican Gayze: A Computational Analysis of the Attitudes towards the LGBT+ Population in Mexico on Social Media Across a Decade (10.00-10.15)

    Juan Vásquez (he/him)

    Thanks to the popularity of social media, data generated by online communities provides an abundant source of diverse language information. This abundance of data allows NLP practitioners and computational linguists to analyze sociolinguistic phenomena occurring in digital communication. In this paper, we analyze the Twitter discourse around the Mexican Spanish-speaking LGBT+ community. For this, we evaluate how the polarity of some nouns related to the LGBT+ community has evolved in conversational settings using a corpus of tweets that cover a time span of ten years. We hypothesize that social media’s fast-moving, turbulent linguistic environment encourages language evolution faster than ever before. Our results indicate that most of the inspected terms have undergone some shift in denotation or connotation. No other generalizations can be observed in the data, given the difficulty that current NLP methods have to account for polysemy, and the wide differences between the various subgroups that make up the LGBT+ community. A fine-grained analysis of a series of LGBT+-related lexical terms is also included in this work.

  • How can knowing about Markov processes help you with your HTTP requests? (10.15-10.30)

    Juan Jose Baldelomar

    Throughout professional development, individuals encounter multiple career transitions. These transitions, while inherent to career growth, can present significant challenges, and adaptability is key for success. Identifying how you can homologate previous experiences to help you with where you currently are, can be a great tool for standing out in the crowd, and accelerate your career path.

  • The Point of View of a Sentiment: Towards Clinician Bias Detection in Psychiatric Notes (11.45-12.00)

    Alissa Valentine (she/they): Alissa is a queer, rising fourth-year PhD student at Mount Sinai in NYC who works with EHR data to study bias in psychiatry through clinical note text.

    In psychiatry, negative patient descriptions and stigmatizing language can contribute to healthcare disparities in two ways: (1) read by patients they can harm their trust and engagement with the medical center; (2) read by future providers they may negatively influence the future perspective of a patient. By leveraging large language models, this work aims to identify the sentiment expressed in psychiatric clinical notes based on the reader's point of view. Extracting sentences from the Mount Sinai Health System's large and diverse clinical notes, we used prompts and in-context learning to adapt three large language models (GPT-3.5, Llama 2, Mistral) to classify the sentiment conveyed by the sentences according to the provider or non-provider point of view. Results showed that GPT-3.5 aligns best to provider point of view, whereas Mistral aligns best to non-provider point of view. Full text here: https://arxiv.org/abs/2405.20582

Contact Us

Email: queer-in-nlp@googlegroups.com

Organizers

Juan (he/him): Juan is a queer, first-year PhD student at the University of Colorado Boulder. During his master's, he worked on hate speech detection for Mexican Spanish. Currently, he is working on novel neurosymbolic methods that take advantage of the mathematical reasoning properties of symbolic methods, as well as the expressivity and adaptability achieved by neural architectures.

Alissa (she/they): Alissa is a queer, rising fourth-year PhD student at Mount Sinai in NYC. Their research aims to use electronic health record (EHR) data to highlight the communities being harmed by current psychiatric frameworks, explore AI’s potential to mitigate psychiatric disparities, and assess AI’s safe integration into clinical settings.

Shaily (she/her): PhD student at CMU, LTI. Interested in evaluation, ethics, inclusion in language technologies.

Cipri (they/them): Data Scientist in Finance with Ms in Applied Math, interested in LGBTQ+ activism.