<aside> <img src="/icons/list_gray.svg" alt="/icons/list_gray.svg" width="40px" /> Contents
</aside>
In this reflection activity, you can start thinking about how to evaluate the relevance of claims made about AI from casual claims made in conversation through media reports to systematic reviews. Here are some questions to ask:
<aside> <img src="/icons/help-alternate_gray.svg" alt="/icons/help-alternate_gray.svg" width="40px" />
Review these scenarios and discuss the questions.
The scenarios were selected to provide a range of different types of claims made about generative AI and Artificial Intelligence in general. Choose one or two.
</aside>
This claim appeared in a blog post widely circulated on social media. It links to another blog post citing a study. Consider different ways of evaluating the claim. What sort of knowledge is required to confirm or disconfirm this claim.
As an Australian study recently showed, ChatGPT does not know how to summarize, only shorten. So far, summarizing remains something only humans do well. (November 2024)
https://www.arthurperret.fr/blog/2024-11-14-student-guide-not-writing-with-chatgpt.html
A report on a recently published study was posted in the New Scientist under the headline: AI chatbots fail to diagnose patients by talking with them | New Scientist. These are the opening paragraphs:
Advanced artificial intelligence models score well on professional medical exams but still flunk one of the most crucial physician tasks: talking with patients to gather relevant medical information and deliver an accurate diagnosis.
“While large language models show impressive results on multiple-choice tests, their accuracy drops significantly in dynamic conversations,” says Pranav Rajpurkar at Harvard University. “The models particularly struggle with open-ended diagnostic reasoning.”
How would you evaluate this claim?