How biased are LLMs?

We’ve known for some time that different Large Language Models (LLMs) have different strengths and weaknesses, but did you know they have political biases too?

New research from universities in the US and China shows they can be biased towards different ends of the political spectrum. The researchers looked at the leading LLMs and tested answers to a suite of questions to gauge whether there was political bias in the answers they gave.  Since the huge amount of data used to build the models includes opinions and ideas, the research revealed that the answers given do have political bias (some leaning left, others leaning right). 

Image from research paper

Image from the research paper: “From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models”
https://arxiv.org/pdf/2305.08283.pdf

To recap, LLMs are a type of artificial intelligence (AI) that has been trained on massive amounts of text data to understand and generate human-like language. They are designed to comprehend the context, nuances and intricacies of natural language, allowing them to generate coherent text, perform language-related tasks and even engage in creative writing.

Some of the most popular LLMs available today are:

  • GPT-4: The latest version of OpenAI’s GPT series – familiar to many as ChatGPT – is estimated to have been trained on over 1 trillion parameters and can generate text on almost any topic, from emails and essays to code and poetry. Bing search also uses OpenAI’s GPT-4, and additionally provides links to sources which can help when users want to assess the veracity of the output
  • PaLM: Google’s LLM, Bard, combines pre-training and fine-tuning on multiple tasks, such as question answering, summarisation and dialogue generation. It has 340 billion parameters and can generate text that is more coherent and diverse than earlier OpenAI GPT versions
  • Claude: Hugging Face’s LLM leverages contrastive learning and self-supervised objectives to improve the quality and relevance of text generation. It is thought to have been trained on 175 billion parameters and some reviewers say it generates more informative and engaging text than GPT-3.

We see LLMs being used in business to automate tasks such as customer service, content creation and data analysis but their output is not always accurate and they pose a range of ethical issues. So you could say that this latest research simply highlights another risk to take into account. Anyone using an LLM needs to be aware its outputs are likely not politically neutral and unbiased and adjust the outputs, or add further training, to compensate for the bias.

The issue, however, goes further. AdWeek revealed that The New York Times has updated its terms of service in an attempt to bar LLMs from sucking in its news content. Will this be challenged by OpenAI, Microsoft or Google? If not then models will need to be trained on narrower datasets, which could increasingly skew them away from a balanced centre ground. 

So long as all of us who use these powerful tools know their limitations, we should be able to adjust for biases like these.  The worry is that too many people think LLMs are neutral and all-knowing.  We all need to know more about their flaws if we are to use them with wisdom and integrity.

You may also like...