<aside> <img src="/icons/list_gray.svg" alt="/icons/list_gray.svg" width="40px" /> Contents

</aside>

What to learn

In this reflection activity, you can start thinking about how to evaluate the relevance of claims made about AI from casual claims made in conversation through media reports to systematic reviews. Here are some questions to ask:

  1. What relevance does the claim have for you? Are you trying to make practical decisions for yourself, policy or just develop a more rounded outlook of Artificial Intelligence? What current knowledge do you have to be able to evaluate such claim.
  2. What kind of AI is this claim about? It may not be clear from the context of the claim such as a title of an article of an abstract of a paper. There is a big difference between generative AI (and tools based on Large Language Models) and other kinds of AI.
  3. What period of time is the claim relevant to? Broadly, is it based on information from before the release of ChatGPT (30 Nov 2022) or after. For academic papers, publication date or even submission date is not a sufficient signal. You have to look at the methods sections. Even papers on “AI” published in 2024 may be based on work done be early 2023.
  4. If the claim is about generative AI or Large Language Models it is essential to review what exact versions of models was used to based the claim on.
  5. Finally, it is not sufficient to evaluate the claim being made but also the context in which it is being made. Does the author or publication have an explicit or implicit objective of advocacy for a particular position. Does it fail the tests outlined in the Code of Conduct for AI Literacy and Policy Practitioners?

What to do

<aside> <img src="/icons/help-alternate_gray.svg" alt="/icons/help-alternate_gray.svg" width="40px" />

Review these scenarios and discuss the questions.

The scenarios were selected to provide a range of different types of claims made about generative AI and Artificial Intelligence in general. Choose one or two.

</aside>

Scenario 1: Casual claim citing a study

This claim appeared in a blog post widely circulated on social media. It links to another blog post citing a study. Consider different ways of evaluating the claim. What sort of knowledge is required to confirm or disconfirm this claim.

As an Australian study recently showed, ChatGPT does not know how to summarize, only shorten. So far, summarizing remains something only humans do well. (November 2024)

https://www.arthurperret.fr/blog/2024-11-14-student-guide-not-writing-with-chatgpt.html

Scenario 2: Claims published about reliability of “chatbots” for medical diagnosis

A report on a recently published study was posted in the New Scientist under the headline: AI chatbots fail to diagnose patients by talking with them | New Scientist. These are the opening paragraphs:

Advanced artificial intelligence models score well on professional medical exams but still flunk one of the most crucial physician tasks: talking with patients to gather relevant medical information and deliver an accurate diagnosis.

“While large language models show impressive results on multiple-choice tests, their accuracy drops significantly in dynamic conversations,” says Pranav Rajpurkar at Harvard University. “The models particularly struggle with open-ended diagnostic reasoning.”

How would you evaluate this claim?