Natural language processing

GPT-4 passes Japan’s National Physical Therapy Examination

GPT-4 passes Japan’s National Physical Therapy Examination



summary
Summary

A new peer-reviewed study shows that OpenAI’s GPT-4 language model can pass the Japanese National Physical Therapy Examination without any additional training.

The research, published in the journal Cureus, tested GPT-4 on both text and visual questions. Japan’s physical therapist exam includes 160 general and 40 practical questions, testing memory, comprehension, application, analysis, and evaluation.

Researchers input 1,000 questions into GPT-4 and compared the answers to official solutions. GPT-4 passed all five test sections, correctly answering 73.4 percent of questions overall. However, the AI struggled with technical questions and those containing images or tables.

GPT-4 passes Japan's National Physical Therapy Examination
One of the exam questions that GPT-4 had to solve. | Image: Sawamura et al.

The model performed much better on general questions (80.1% correct) than practical ones (46.6% correct). Similarly, GPT-4 handled text-only questions (80.5% correct) far better than those with pictures and tables (35.4% correct). These findings align with previous research on GPT-4’s visual comprehension limitations.

Ad

GPT-4 passes Japan's National Physical Therapy Examination
In all tests, GPT-4 had much more difficulty answering practical questions. | Image: Sawamura et al.

Interestingly, question difficulty and text length didn’t significantly impact GPT-4’s performance. The model also performed well with Japanese input, despite being primarily trained on English data.

Multimodal models could bring further improvements

While the study shows GPT-4’s potential in clinical rehabilitation and medical education, the researchers caution that it doesn’t answer all questions correctly. They stress the need to evaluate newer versions and the model’s capabilities in written and reasoning tests. Multimodal models such as GPT-4o could potentially yield better results in visual comprehension.

Large language models have shown promise in medicine for some time. Specialized versions like Google’s Med-PaLM 2 and Med-Gemini aim to outperform general models like GPT-4 in medical tasks. Meta also has Llama 3-based models designed for the medical sector.

GPT-4 passes Japan's National Physical Therapy Examination
Image: via Google

However, it is likely to be a long time before medical AI models are widely used in practice. Even current benchmarks leave too much room for error, which is particularly critical in medical contexts. As in many other applications where precision and correctness are key, significant improvements in reasoning skills seem necessary to safely integrate these models into everyday practice.

GPT-4 passes Japan's National Physical Therapy Examination

Source link