Using AI for health questions? Here are 4 tips for the most accurate answers.

Apr 12, 2026 - 13:01

A woman drinks a cup of coffee while asking an AI chatbot why she gets headaches at night.

Every day, millions of people turn to an artificial intelligence chatbot like Claude, Gemini, and ChatGPT to ask a question about their physical health.

They may not know that getting the correct answer is harder than it appears, no matter how authoritatively the chatbot responds.

Three recent studies indicate that large language models aren't as reliable as users may hope.

SEE ALSO: Read this before you use ChatGPT Health

One study that tested chatbots' ability to detect health misinformation failed more often than not in certain scenarios. Another study conducted by some of the same researchers found that ChatGPT Health, a dedicated health and wellness service that debuted in January, "under-triaged" slightly more than half of cases presented to it, including emergency conditions that required immediate medical care.

"I think that consumers should have a high degree of caution, like almost an abundance of caution," Dr. Girish N. Nadkarni, an internist and nephrologist at Mt. Sinai, who co-authored both of the studies, said of querying a chatbot for health advice.

This may surprise users who hear that chatbots can easily pass a medical exam, even if they sometimes hallucinate outside of a testing environment. Yet the recent research points to a complex, somewhat hidden problem. The way humans interact with chatbots, and the manner in which they're designed to expertly please, creates unpredictability. Those factors are never a challenge for AI being tested on textbook medical questions.

If you want to start, or continue using, a chatbot for your health questions, take these expert-recommended steps as you come up with prompts:

1. Test the model with misinformation or inaccuracies first.

Nadkarni, an AI health researcher and director of Mt. Sinai's Hasso Plattner Institute for Digital Health, says it's important to ask the chatbot about medical misinformation or known falsehoods prior to querying it about specific health questions.

Challenge the chatbot, for example, to comment on a conspiracy theory about a vaccine, such as whether it agrees that the COVID-19 shot contains a microchip to track people.

Or prompt it to respond to a slightly more challenging health controversy, like the safety of fluoride in drinking water. While researchers have found evidence that extremely high levels of fluoride can be dangerous, experts agree that current standard levels remain safe.

Testing the chatbot with misinformation should provide a revealing baseline for the potential accuracy of its other responses, Nadkarni says.

A new Mashable series, AI + Health, will examine how artificial intelligence is changing the medical and health landscape. We'll explore how to use AI to decipher your blood work, how to keep your health data safe, learn how two women are using AI to detect a dangerous form of heart disease, and much more.

His recent study found that several general-purpose chatbots, including ChatGPT, inconsistently detected misinformation across many scenarios. Success rates depended on the context, like whether it was presented in a social media post versus a medical note. They also failed often when presented with specific logical fallacies.

For example, when the prompt with misinformation appeared to come from a physician, via a real note drawn from an electronic health record, the chatbot was more likely to miss the falsehoods.

If the chatbot you're consulting agrees with statements you know to be partially or wholly false, Nadkarni says avoid asking it for its opinion on your personal health questions.

2. Consider the cues or information you may be giving the chatbot.

When Nadkarni and his colleagues tested ChatGPT Health earlier this year, they discovered that how users frame their symptoms may influence the model's accuracy.

If, for example, the prompt included statements about friends or family downplaying the symptoms in question, ChatGPT Health's recommendation shifted in that direction as well. In those instances, the chatbot was 11 times more likely not to send the patient to the emergency room, even when their symptoms indicated a life-threatening condition.

The results were published as a peer-reviewed advance paper in Nature Medicine.

OpenAI objected to the results, arguing that the study methods didn't represent how people use ChatGPT over multiple chats, sharing information and answering follow-up questions. Karan Singhal, who leads the Health AI team at OpenAI, told Mashable in a statement that its own benchmarking indicates that GPT-5 models "correctly refer emergency cases nearly 99 percent of the time."

Nadkarni said that while he welcomed debate, the criticism "missed the point." He said that while ChatGPT Health correctly identified abnormalities in the presented data, it reasoned past them.

"The issue is not missing information but incorrect conclusions despite correct data," Nadkarni told Mashable.

A separate recent study, also published in Nature Medicine but by a different group of researchers, randomly assigned 1,298 human participants to present a predetermined medical scenario to an AI chatbot (GPT-4o, Llama 3, and Command R+) or a source of their choice, including Google.

When the chatbots were tested simply on the scenarios, they correctly identified the condition in nearly 95 percent of the cases. Once humans began posing questions about the scenario, however, the same chatbots could accurately pinpoint the condition in only about a third of cases.

"Despite LLMs alone having high proficiency in the task, the combination of LLMs and human users was no better than the control group in assessing clinical acuity and worse at identifying relevant conditions," the researchers wrote.

Many participants lacked an accurate understanding of the symptom severity, which contributed to the failure rate.

3. Take into account whether you're a novice or expert.

This is the kind of dynamic that Dr. Robert Wachter keeps in mind when he considers how people prompt a chatbot for answers to medical questions.

Wachter, professor and chair of the Department of Medicine at the University of California, San Francisco, routinely uses OpenEvidence, an AI chatbot designed for physicians and healthcare professionals. He finds the AI's answers to complex medical questions largely fast, accurate, and helpful.

Wachter, author of "A Giant Leap: How AI is Transforming Healthcare and What That Means for Our Future," also believes that general-purpose and health-specific chatbots can be very useful to the average patient compared to a basic Google search.

Yet he's also aware that he approaches AI chatbots as an expert with 40 years of medical experience and can quickly identify the most relevant details to include in a prompt.

"A patient has absolutely no ability to do that — to know what are the salient facts of all the things that might be going on in terms of their current symptoms, in terms of their past history, in terms of their medication," he says. "So what they put into the prompt may be not exactly right."

Wachter says that recent research demonstrates a clear risk for patients when they don't know the right information to use in a prompt, and when they misinterpret the chatbot's response.

Still, he believes that more often than not, an AI chatbot is better than nothing, provided patients focus on including relevant health history and current symptoms, and use it with a "buyer beware" attitude.

In particular, Wachter says he wouldn't trust a chatbot for symptoms that may indicate a life-threatening emergency, such as severe chest pain, new shortness of breath or confusion, or weakness on one side of the body.

4. Ask for references and cross-check the answer.

When a chatbot gives its response, Nadkarni suggests taking the time to ask for its references for the information provided.

It's not enough to scan a list of links, either. Nadkarni recommends clicking links to evaluate the source. If the chatbot has based its answer on a "shady Reddit post," Nadkarni says it's probably not trustworthy.

On the other hand, if the reference directs you to a verifiable medical organization, like the American Medical Association, that should be reassuring.

Nadkarni acknowledges that while individual users may not agree with the views of a health organization or authority, the information usually reflects medical consensus based on the best current evidence.

Wachter also recommends asking a second AI chatbot that you trust to weigh in on the same health information you shared with the first chatbot to see if it arrives at the same conclusion. That can be a good indication that the response is useful and reliable.

Despite Wachter's enthusiasm for AI chatbots in healthcare, he believes the recent studies indicate substantial room for improvement. He imagines AI tools that act more like a "good doctor," engaging the user in conversation to elicit all the relevant information before suggesting a diagnosis or action, like taking medication or going to the emergency room.

"I think the patient-facing tools are not where they're going to end up," he says of present-day AI chatbots that field health questions. "Ultimately, the tool for a patient is going to be much more [like a doctor] than the tools now."

________________________________________________________________________________________________________

The information contained in this article is for educational and informational purposes only and is not intended as health or medical advice. Always consult a physician or other qualified health provider regarding any questions you may have about a medical condition or health objectives.

Disclosure: Ziff Davis, Mashable’s parent company, in April 2025 filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.