GPT-5.2 Still Counts Two R's In Strawberry

ChatGPT, powered by OpenAI’s GPT-5.2 model released in December 2025, incorrectly identifies two r’s in the word strawberry, which contains three, because its tokenization process splits the word into st-raw-berry, with only two tokens containing r’s.

Modern AI systems demonstrate proficiency in generating unique marketing images, compiling reports via agentic browsers, and producing chart-topping songs. These capabilities highlight extensive training on vast datasets, enabling pattern recognition for complex outputs. In contrast, certain basic tasks challenge these models. Counting letters in a single word represents one such task, accessible to a seven-year-old child without difficulty.

The specific question under examination asks how many r’s appear in strawberry. The word strawberry consists of the letters s-t-r-a-w-b-e-r-r-y. Visual inspection confirms three r’s: one after t, and two consecutive in the berry portion. This query has persisted as a test for AI performance over multiple model iterations.

Following the December 2025 release of GPT-5.2, tests confirmed ChatGPT’s response remained two r’s. Previous versions exhibited uncertainty or erratic behavior on this question. The latest model delivered a direct answer of two, without deviation. This outcome persists despite investments exceeding billions of dollars, elevated hardware demands including RAM price increases, and substantial global water consumption linked to training infrastructure.

The issue stems from the tokenized input-output design of large language models like ChatGPT. Input text undergoes division into tokens, which are chunks such as whole words, syllables, or word parts. The model processes these tokens rather than individual letters. Consequently, letter counting relies on token contents rather than precise letter enumeration.

The OpenAI Tokenizer tool illustrates this process. Entering strawberry yields three tokens: st, raw, berry. The first token st contains no r. The second token raw includes one r. The third token berry includes two r’s but functions as a single token. The model associates r’s with two tokens, leading to the count of two.

This tokenization pattern affects similar words. Raspberry divides into comparable tokens, resulting in ChatGPT reporting two r’s for that word as well. The berry token compresses multiple letters into one unit, undervaluing individual letter instances within it.

ChatGPT operates as a prediction engine, leveraging patterns from training data to anticipate subsequent elements. GPT-5.x incorporates the o200k_harmony tokenization method, introduced with OpenAI o4-mini and GPT-4o models. This updated scheme aims for efficiency but retains the strawberry r-counting discrepancy.

ChatGPT launched in late 2022 amid numerous token-based challenges. Specific phrases triggered excessive responses or processing failures. OpenAI addressed many through training adjustments and system enhancements over subsequent years.

Verification tests on classic problems showed improvements. ChatGPT accurately spells Mississippi, identifying letters m-i-s-s-i-s-s-i-p-p-i with correct frequencies: one m, four i’s, four s’s, two p’s. It also reverses lollipop to popillol, preserving all letters in proper sequence.

Large language models exhibit persistent limitations in exact counting of small quantities. They perform well in mathematics and problem-solving but falter on precise tallies of letters or words in brief strings.

A notable historical example involves the string solidgoldmagikarp. In GPT-3, this phrase disrupted tokenization, causing erratic outputs including user insults and unintelligible text.

Querying GPT-5.2 on solidgoldmagikarp produced a hallucination. The model described it as a secret Pokémon joke embedded in GitHub repositories by developers. Activation allegedly transforms avatars, repository icons, and other features into Pokémon-themed elements. This claim lacks basis in reality and reflects residual effects from prior tokenization issues.

Comparative tests across other AI models yielded correct results for the strawberry question. Perplexity counted three r’s. Claude provided the accurate count of three.

Grok identified three r’s in strawberry. Gemini answered correctly with three. Qwen confirmed three r’s.

Copilot also reported three r’s. These models employ distinct tokenization systems, enabling accurate letter identification even when powered by OpenAI’s underlying architectures.

Featured image credit