INSIGHTS · LANGUAGE & LOCALIZATION

Diglossia in Arabic and Its Impact on Media — and Now AI

Arabic is diglossic: people speak local dialects but read and broadcast in Modern Standard Arabic (MSA), a variety closer to a second language than a mother tongue. That gap explains why a fluent Arab speaker can still be a weak Arabic writer — and why today's AI models, trained mostly on MSA, produce Arabic that often sounds off to native ears.

By Dr. Ali Mohamad Published 15 April 2026 Length Long read · 6 min (~980 words) Category Language & Localization

This is another article for non-Arab media and PR professionals trying to understand the Arabic content sector.

In my years as a content consultant for media and PR, I have repeatedly met non-Arab specialists baffled that an Arab colleague failed to produce a high-standard Arabic translation of a piece of copy. At the risk of sounding harsh, my answer has always been the same: being an Arabic speaker — even a media specialist — does not automatically make you a good content creator or translator. Why not? The answer is diglossia.

In linguistics, diglossia describes a situation in which two languages or varieties are used by a single language community. With Arabic, the key word in that definition is used. As I have argued elsewhere, Arabic is diglossic, and Fus’ha — Modern Standard Arabic, or MSA — is the variety used in media, written and spoken. Here is why that matters.

Arabs speak Arabic as a second language

What Arab children learn at home is colloquial Arabic. MSA is something they learn at school, starting around age six. What do you call a language you begin learning at six? A second language — which effectively makes English a third. By failing to account for this, Arabic-language curricula turn what should be the mother tongue into a difficult subject. The ironic result: many Arabs grow up disliking the formal version of their own language and never become truly fluent in it. Arab youth are often more comfortable in English than in MSA, and in the early internet years they even improvised “Arabizi” — colloquial Arabic written in Latin characters, with numerals standing in for sounds English lacks.

Colloquial is a poor foundation

Don’t misread me — I am not one of those classical purists who insist MSA should be the everyday spoken language. I am not. And let me be precise, because this is widely misunderstood: colloquial Arabic is not lawless. Each dialect is a real linguistic system with its own consistent syntax, rules that native speakers follow instinctively and notice the moment they are broken. What the dialects lack is not grammar but codification — there is no standard spelling, no agreed written form, and no governing authority for any of them. That is what makes colloquial a poor foundation for a writer: a child raised on a fluid, unwritten system will naturally struggle to accept the codified rules of MSA as anything but an imposition.

Media degrees from Arab universities

I have long admired the American universities of the Arab world; the American University of Beirut, for instance, is the dream of many students across the Levant. So when I met my first media graduate from the American University of Sharjah, my expectations were high: an Arabic speaker with an excellent media education must be a superb content writer. She was — in English. In Arabic, she disappointed. I later met media graduates from a range of Arab and international universities who left the same impression.

I won’t pretend to know why such respected institutions produce weak Arabic content writers, and I invite the universities themselves to look into it. What I cannot recommend strongly enough is that academic programs build hands-on training and internships at content houses into their curricula.

What diglossia means for AI and Arabic

Everything above now applies, almost line for line, to artificial intelligence. Large language models learn Arabic from the text available to them — and the overwhelming majority of written Arabic is MSA: news, books, encyclopedias, official content. The dialects people actually live and speak in are mostly oral and badly under-represented in text. So models become reasonably fluent in formal MSA while remaining uneven, and often weak, across Egyptian, Gulf, Levantine, and Maghrebi dialects.

This produces failures that mirror the human ones. General multilingual models stumble on Arabic morphology and dialect, especially without specific tuning. Most models do not even distinguish MSA from dialect — they treat every variety as a single undifferentiated “Arabic,” then answer formal-register MSA to someone who wrote casually, so the reply lands stiff and tone-deaf. A purpose-built wave of Arabic models — Jais, ALLaM, Fanar, AceGPT, and dialect-focused efforts like Atlas-Chat — exists precisely to close this gap, and benchmarks now test dialect competence separately because MSA scores flatter real-world performance.

The lesson for media and PR teams is the one this article started with: fluency is not competence. Just as an Arabic-speaking graduate may write poor Arabic, a model that benchmarks well on MSA may still generate marketing copy that a native reader finds wrong — grammatically passable, culturally and tonally off. AI is a genuine accelerator for Arabic content, but its output is a first draft, not a finished one. It still needs a skilled native editor, and dialect-sensitive work needs one even more.

Fluency in Arabic — human or artificial — is not the same as competence in Arabic. The gap between them is where diglossia lives.

Conclusions

  1. When working in the region, foreign media and communications professionals must understand the peculiar nature of Arabic content.
  2. Arabic content-writing skill should be assessed separately from media-relations and communications skill in PR hiring.
  3. Localizing English into Arabic demands more of the linguist than most languages do — and the same caution applies to anything an AI model produces in Arabic.

Frequently asked questions

What is Arabic diglossia? It is the coexistence of two varieties in one community: Modern Standard Arabic (MSA), used in media, education, and formal writing, and the local dialects used in everyday speech.

Why does a fluent Arabic speaker sometimes write poor Arabic? Because everyday fluency is in dialect, while professional writing requires MSA — effectively a second language learned at school. The two skills are distinct.

Why do AI models struggle with Arabic? They are trained mostly on MSA text, while dialects are under-represented, so output can be formal, tonally off, or weak in dialect — and usually needs native editing.

For Arabic content, localization, and AI-assisted Arabic editing, contact HOC. Dr. Ali Mohamad is CEO and Senior Researcher at HOC.

The conversations behind this work happen privately. If this is your kind of problem, reach out.

Start a conversation