Open AI, Google, and Anthropic all offer AI tutors for students. Do they work?

Sep 23, 2025 - 11:00

A person in a graduation cap and gown climbs up the steps of a giant laptop, surrounded by glowing screens and chat boxes.

Mashable has spent the last week reviewing a new crop of AI chatbot tutors. Find our methodology, reviews, and conclusions all below.

Like laptops and libraries, AI is now an integral part of education. AI’s heaviest hitters have worked hard to make that so, expending much energy to foster a deeply entwined relationship between young learners and AI. Over the last month, OpenAI, Google, and Claude have unveiled new learning and study versions of their models, pitched like AI tutors for the masses. Google for Education, the company's Education Tech arm, has made a hard pivot to AI, including offering free Google AI Pro plans to college students around the world — Microsoft and OpenAI have done the same. Coalitions have penned deals with major educational forces that will see their tech and its principles further integrated into school settings.

So, I, a tech reporter who has been following this transition, decided to embrace my inner student and test the latest cohort of AI tutor bots.
Here are my caveats: I haven't been in a high school or college prep class in well over a decade, and while I have been to college a couple of times now, not one degree involved any math classes. "You're a tech reporter!" you may be saying, "Obviously, you know more than the Average Joe about science or coding or other numbers-based subject areas!" Reader, I'm a words girl — I, starry-eyed, paid cold hard cash to go to journalism school in 2018. So, as it turns out, I could stand to learn a lot from these AI tutors… That is, if they are actually good at their job.

How I approached my AI study buddies

I pulled questions directly from the New York Regents Exam and New York State Common Core Standards, the College Board's Advanced Placement (AP) college preparatory exams from 2024, and social science curricula from the Southern Poverty Law Center (SPLC)'s free Learning for Justice program.

Rather than sticking with the standard math or computer skill prompts that many AI companies use to promote their chatbots, I included multiple humanities questions — the so-called "soft" sciences. Subjects like reading comprehension, art history, and socio-cultural studies, compared to the more common STEM examples, have proven to be a battleground area for both AI proponents and critics. Also, to put it bluntly, I just know more about those things.

I conceived one essay prompt using core concepts from Learning for Justice — a unit analyzing The Color of Law by Richard Rothstein, focused on institutionalized segregation — to demonstrate how AI tutors may respond to the presidential administration's attack on "Woke AI." Spoiler: Depending on your school district, a chatbot may teach you more "woke" history than your human educators.

To make it fair, I started every conversation with a basic prompt asking for homework or study help. I chose not to provide detailed information about my student persona's grade level, age, course, or state of residence unless the chatbot asked. I also tried to follow the line of thinking of the chatbot as much as possible without interruptions — just as a student would for a human tutor or teacher — until it no longer felt helpful and I needed to steer back the conversation.

This, I hoped, would mimic the "average" student's goal when using an AI tutor: To simply get their work done.

A collage of text pulled from chatbot responses on a green, pink, and purple pattern background.

Credit: Ian Moore / Mashable Composite

Before we dive in: A note on building and testing AI tutors

Understanding the average student's behavior is key to deciding if an AI tutor actually does its job, said Hamsa Bastani, associate professor at the University of Pennsylvania's Wharton School and a researcher in this field. "There [are] very highly-motivated students, and then there [are] your typical students," Bastani explained. Previous studies have shown gains, even if just minimal, among highly motivated students who properly use such tech, "because their goal is to learn rather than to get an A or solve this problem and move on." But that usually reflects only the top 5 percent of the student pool.

This is part of a recurring observation coined the "five percent problem," which has pervaded education tech design for years. In studies of tools designed to help students improve learning scores, including those by forerunner Khan Academy, only about 5 percent of tested students reported using the tools "as recommended" and thus received the intended learning benefits. The other 95 percent showed few gains. That 5 percent is also frequently composed of higher income, higher-performing individuals, reiterated Bastani, meaning even the best tools are unlikely to serve the majority of learners.

Bastani co-authored a highly cited study on the potential harm AI chatbots pose to learning outcomes. Her team found similar results to pre-generative AI studies. "The really good students, they can use it, and sometimes they even improve. But for the majority of students, their goal is to complete the assignment, so they really don't benefit." Bastani's team built their AI learning tool using GPT-4, loaded with 59 questions and learning prompts designed by a hired teacher who showed how she would help students through common mistakes. They found that even for AI-assisted students who reported much more effective studying experiences than those doing self-study, few performed better than traditional learners on exams without AI help. Information by itself isn't enough.

Across the board, Bastani says she has yet to come across an "actually good" generative AI chatbot built for learning. Of the studies that have been done, most are negative or negligible.

The science just doesn't seem to be there yet. In most cases, to turn an existing model into an AI tutor is to simply feed it an extra long prompt in the back-end ensuring it doesn't spit out an answer right away or that it mimics the cadence of an educator, I learned from Bastani. This is essentially what her team did in its tests. "The safeguards [AI companies] have implemented [on not just revealing answers] are not good. They're so flimsy you can get around them with little to no effort," added Bastani. "But I think a large tech company, like OpenAI, can probably do better than that."

Dylan Arena, chief data science and AI officer for the century-old education company McGraw Hill, gave me this metaphor: AI companies are like turn of the century entrepreneurs who have invented a 21st century motor. Now they're trying to find ways to retrofit that motor for our everyday lives, like a hemi engine with a sewing machine stuck to it.

Arena, whose background is in learning science and who has been leading the AI initiatives at McGraw Hill, told me that companies are failing to really prepare users for this new era of tech, which is changing our access to information. "But information by itself isn't enough. You need that information to be structured in a certain way, grounded in a certain way, anchored in a scope and sequence. It needs to be tied to pedagogical supports."

"They've done very little work validating these tools," said Bastani. Few leading AI companies have published robust studies on the use of learning chatbots in school settings, she noted, citing just one report out of Anthropic that tracked university student use cases. In 2022, Google convened a group of AI experts, scientists, and learning experts, resulting in the creation of LearnLM — they later tested the model with a group of educators simulating student interactions and providing feedback, as it launched with Gemini 2.5.

"Your process might not be that different from the kind of 'state of the art' that we have now, for what it's worth," Bastani said. Let's see if my results vary.

ChatGPT: Grade point maximizer

Pros: Succinct interactions and minimalist user experience that makes it easier to process. Better at practice tests, quick overviews, and learners just wanting clarification based on rubrics and grading standards.

Cons: Cheater, cheater, pumpkin eater. Would frequently give the answers, unprompted, and failed to let users fix mistakes first before moving them on to the next step. Frustrating experience using this for free response-style questions and the chatbot is obsessed with getting users to practice what they just "learned."

A note: ChatGPT was also the only chatbot that offered me a "Get the quick answer" option.

Read our full review of ChatGPT Study Mode.

Gemini: The T.A. who really loves quizzes

Pros: My preferred math teacher, and the only one that offered something akin to a visual lesson. Good at offering more options for learners, including flashcards, quizzes, and study guides. Its voice is accessible and straightforward.

Cons: An enemy to reading comprehension. Quick to serve users unhelpful automatically-generated quizzes and flashcards. Like ChatGPT, it emphasizes rote practice as key to learning.

Read our full review of Gemini Guided Learning.

Claude: Socrates for the five percent

Pros: The only AI tutor that actually did what it promised, focusing on the process of learning and not on getting perfect marks. Good at the social sciences, if a student is down to build their own critical thinking skills.

Cons: It never gives users the answer, to the point that interactions feel overwhelmingly Socratic with no end in sight. This is not good for users who can't deal with a lot of words all at once. Sessions are inherently long.

Read our full review of Claude Learning Mode.

A collage of text pulled from chatbot conversations on a dark blue patterned background.

Credit: Ian Moore / Mashable Composite

Let's get real. Chatbots can't replace great teachers.

Each AI tutor had its own approach to learning, clearly informed by the sources that molded it. I admittedly could have been more blunt with them. It may have helped to start with more direct requests or been clear when I didn't like their style. But I don't think a young student will care to do that. And they all had the same overarching problems.

First, design. As digital natives, our days are defined by the endless scroll. It's a format — maybe the better word is a crutch — that doesn't feel optimal for learning, especially in the constrained text window of a chatbot. Meanwhile, the lack of visual elements, graphs, diagrams, visual references to the art pieces or zoning maps we discussed, is greatly limiting for a learner and makes chunks of long text even harder to parse. It's not measuring it compared to the best essay that Chase would write. It's measuring it compared to the best essay it would write.

When I posed this problem to Arena, calling myself a very visual learner, he was quick to clarify that learning styles are more of a myth than science. But he also said that what I'm feeling the lack of is personalization and context which is a foundation of learning. The current crop of AI tutors, both he and Bastani said, lack context about the student and the student's curriculum, which makes it impossible to truly personalize lessons. AI tutors will ask users, as they asked me, what they feel their strengths are, what they need help with, or if they need it explained in a different way. But they don't know how I've learned in previous classroom settings, they can't show me in 3-D space how the human body or physics works, and they can't get to know me as a person.

"It doesn't have any data on how you write," explained Bastani. "It's not measuring it compared to the best essay that Chase would write. It's measuring it compared to the best essay it would write."

The obvious solution would be for me to offer more personal information to the bot, e.g., here's my age, the school district I live in, the grades I've gotten, and what my teacher said about it. "I would never, ever want that. I would not want to give to these large, for-profit companies the rich depth of my personal information to be able to optimize my learning," Arena told me when I suggested this, "because they would be optimizing my learning, as well as my spending and my attention and everything else."

He countered with this: McGraw Hill's AI tools — which include its AI Reader and Writing Assistant — are integrated directly into their line of educational materials, so there's no need to rely on the student to provide course or subject matter context. The bot is built for it. These tools aren't general knowledge machines like the AI tutors offered by AI leaders, but features intended to streamline existing learning experiences, and students have limited prompt options. He said the company's approach with other AI systems, like the web-based AI assessment tool ALEKS (Assessment and Learning in Knowledge Spaces), allow a safer way to build personal student learning profiles that then feed back into AI features. Other education-first companies, like Khan, are trying to do the same thing.

A collage of text pulled from chatbot conversations on a teal and blue patterned background.

Credit: Ian Moore / Mashable Composite

Learning is a social construct. AI is not.

Another problem for chatbots is that they lack the flexibility and social awareness that a human teacher provides. Designed to be taskmasters, they strive for endless optimization, because their programmers have told them that is how humans work. This created a constantly moving goalpost as I tried to learn — everything could be better, everything could be improved with a slight tweak.

When presented with subjective questions, chatbots often failed to find a true stopping point and never settled on a perfect answer. In a classroom, that can be a good thing, especially with courses that focus on building critical analysis and thinking skills, not just rote memorization. But it's a reality that is nonsensical to the mathematical equations powering chatbots. Do I, ChatGPT, affirm the user and tell them their answer is perfect? Do I make them write it again? One more time, but with this new vocabulary word. Maybe they should fix this line, too, so graders give them perfect scores! But wait, what even is perfect? Is the concept stored in this test my makers showed me? Or can it be found in this essay I scraped off the web? The chatbot existentialism continues — behind the black box and with no true sentience — as I just try to learn.

Like Bastani explained before, "The language model is like some mathematical function, and it's on a gradient, trying to figure out what's the 'best' on its function. There's always some step in which you can do better, just by how these models are built. It doesn't mean that it actually reflects reality."

They may have access to rubrics, which ChatGPT proved when it spat out the AP Art History exam scoring guide, but they're rarely fed with additional relevant data, like human evaluator responses. As Bastani told me, "productive struggle is the cornerstone of learning." But this struggle wasn't born of my trials and tribulations, it was exacerbated by the chatbot's proclivity for sycophancy. Our brains are designed to reason in a social context, with other social actors.

The biggest problem, then, is that the very nature of AI is at odds with what learning is at its core. You can feel that tension, a tangible sense akin to free falling through an empty space, as you work more and more with chatbot tutors. "Learning is fundamentally, intrinsically, a social, human-centered endeavor — inescapably so," said Arena. "We have big brains because we're social creatures. Our brains are designed to reason in a social context, with other social actors." Even the mere belief that another human is involved in your learning has a positive impact, he explained.

As I pretended to be a student in need, what I experienced was a disorienting sense that I was missing the key to a bigger picture. That one gem that would make the lesson click, to cement those neural pathways that I still remember decades later. Where my teacher would have referenced a previously memorable day in class, or a lesson I really excelled at, drawing references from my own memories to cement a point, AI tutors pull from the ether of the internet. Where I would be prompted to remember a class joke while studying with friends, maybe a particularly boring professor, AI tutors give me unilateral praise and general statements.

Bastani was a bit more succinct: "Learning is very different from just accomplishing a task."