A recent analysis reveals a linguistic gap in the cost of chatbots: interacting in Spanish with models like GPT-5 or Claude Opus 4.7 consumes more tokens than doing so in English. The word desarrollador can cost up to nine tokens in Claude, compared to six for developer, while in ChatGPT the difference is three to one. This is because tokenizers, mostly trained on English data, penalize other languages, making each interaction more expensive for Spanish-speaking users.
Tokenizers and training bias: the technical origin of the extra cost 🤖
Language models do not process whole words, but fragments called tokens. The tokenizer of a model like GPT-5 divides text into units based on statistical frequency; being trained on 95% English data, it recognizes words like developer as a single token, while desarrollador is fragmented into several. In Claude Opus 4.7, the difference is even greater: desarrollador requires nine tokens, tripling the computational cost. This bias not only affects the price per query, but also slows down response time and reduces efficiency in large-scale applications, such as virtual assistants or customer service systems in Spanish.
How to close the token gap in artificial intelligence? 🔧
To mitigate this inequality, technical solutions such as optimized multilingual tokenizers or models trained with balanced Spanish corpora are being considered. On a regulatory level, demanding transparency in cost per language could foster competition. Meanwhile, Spanish-speaking users can reduce expenses by using short terms or mixing technical English, although this limits accessibility. Linguistic equity in AI is not just a technical problem, but a digital inclusion challenge that deserves urgent attention.
If unequal tokenization makes the use of Spanish more expensive in models like GPT-5 or Claude, what economic and social implications could this linguistic gap have for Spanish speakers in the artificial intelligence ecosystem?
(PS: moderating an internet community is like herding cats... with keyboards and no sleep)