NAACL

To be presented at NAACL 2024 🇲🇽

Three papers are accepted to NAACL 2024 main conference!

  • In Mitigating Bias for Question Answering Models by Tracking Bias Influence, we present an approach to mitigate the bias of multiple-choice QA models. Based on the intuition that a model would lean to be more biased if it learns from a biased example, we measure the bias level of a query instance by observing its influence on another instance. We then use the bias level detected as an optimization objective to form a multi-task learning setting in addition to the original QA task.
  • In Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models, we demonstrate that an attacker can inject backdoors by issuing very few malicious instructions among thousands of gathered data and control model behavior through data poisoning. Through such instruction attacks, the attacker can achieve over 90% attack success rate across four commonly used NLP datasets, and cause persistent backdoors that are easily transferred to 15 diverse datasets zero-shot.
  • In Instructional Fingerprinting of Large Language Models, we present a pilot study on LLM fingerprinting as a form of very lightweight instruction tuning. Model publisher specifies a confidential private key and implants it as an instruction backdoor that causes the LLM to generate specific text when the key is present. Results on 11 popularly-used LLMs showed that this approach prevents publisher overclaim, maintains robustness against fingerprint guessing and parameter-efficient training, and supports multi-stage fingerprinting akin to MIT License.
Presenting at NAACL 2021 🌎
We present our demo system EventPlus at NAACL 2021, the live system and code is released. Check it out!