DIt’s been a few weeks we don’t stop to put challenges to ChatGPT to see how far the AI is capable of reaching at this time. A mixture of curiosity and disturbing information that has its latest data in medicine: the tool has been able to approve go awaythe exam of access to medicine in the United States.
To be more exact, a team of researchers has put the program to the test to measure your clinical reasoning skills using questions from the United States Medical Licensing Examination (USMLE). According to the authors of a study that has been published in medRxiv:
We chose to test the AI generative language on the USMLE questions as it was a high-stakes, three-step comprehensive standardized test program that covered all topics in the clinicians’ pool of knowledge. Knowledge, covering basic sciences, clinical reasoning, medical management, and bioethics.
The results might not be more surprising considering that the language model was not trained on the version of the test used by the researchers, nor did it receive any additional medical training prior to the study, in which it answered a series of open-ended questions and multiple choice. According to the authors of the work:
In this current study, ChatGPT performed with >50% accuracy in all tests, exceeding 60% in most tests. The USMLE passing threshold, though it varies by year, is regarding 60%. Therefore, ChatGPT is now comfortably within the range of approval. Being the first experiment to reach this benchmark, we think this is a surprising and impressive result.
G/O Media may get a commission
Not only that. Following the results, the team believes that the AI’s performance might be improved with more prompts and interaction with the model. In fact, when the AI performed poorly, providing less consistent answers, they believe it was partly due to a lack of information that the AI hasn’t found. As the study indicates:
Paradoxically, ChatGPT outperformed PubMedGPT (50.8% accuracy, unpublished data), a peer [modelo de aprendizaje de idiomas] with a similar neural structure, but trained exclusively in biomedical domain literature. We speculate that domain-specific training may have created further ambivalence in the PubMedGPT model, as it absorbs real-world text of ongoing academic discourse that tends to be inconclusive, contradictory, or highly conservative or evasive in its language.
The next? The researchers suggest that AI may very soon become commonplace in healthcare settings, given the speed of progress in the industry.[[IFLScience]