Sina Ghertasi Oskouei
1*
1 Digital Dental Unit, Dental Faculty, Tabriz University of Medical Sciences, Tabriz, Iran
Abstract
For decades, the global expansion of regional scientific literature—including strategic efforts by Iranian health journals—has been constrained by a persistent bottleneck: the English-language hegemony. A long-held hypothesis that technology might eventually dismantle this linguistic monopoly is now materializing,1 propelled by rapid advances in generative artificial intelligence (AI). Although more than 90% of top-tier scientific journals remain exclusively English-medium,2 this historical inertia no longer guarantees future dominance. In an era where large language models seamlessly parse semantics and context, the scientific community must confront a fundamental question: is it still justifiable to compel researchers to filter complex scientific thought through the grammatical constraints of a second language?Historically, publishing outside the English language has been synonymous with scientific invisibility. English-language papers consistently garner higher citation rates,2 and generate substantially more engagement across alternative metrics (Altmetrics),3 effectively penalizing researchers for their native linguistic origins. However, the publishing paradigm is shifting. Contemporary AI-driven translation architectures can now reconstruct scientific texts across linguistic boundaries with precision exceeding 95%.4 From the computational perspective of an AI model, a rigorously argued, granular manuscript authored in fluent Persian represents far richer, more actionable data than the same research articulated in broken English, where critical scientific nuances are inevitably compromised by a restricted vocabulary.Yet, overcoming the linguistic barrier addresses only a fraction of the challenge. The central bottleneck in scientific dissemination has definitively pivoted from linguistic translation to structural accessibility. Even if AI can flawlessly translate a Persian manuscript, it must first be capable of parsing the underlying document. Currently, a vast repository of regional research remains computationally locked within PDFs—a format engineered for typographical fidelity and human visual consumption, not for algorithmic extraction. Conversely, major international publishers have long transitioned to machine-readable architectures such as XML and HTML. XML structures data logically, rendering it inherently searchable and interoperable. It is precisely through this structural readability that modern AI-driven academic search engines (such as Consensus or Elicit) can globally retrieve and recommend these articles, regardless of their original language of publication. The JATS 1.4 standard advances this by explicitly supporting multilingual scientific corpora, providing a framework to host metadata and full-text translations symmetrically, without fracturing established monolingual archives.5 Needless to say, without implementing such standards, even the finest regional articles will appear to AI as mere vague images—rather than usable scientific data.