More about the project

Goals and scope


In spite of Soqotri’s deeply archaic linguistic profile and the exceedingly rich oral tradition of the islanders, the linguistic and cultural heritage of Soqotra is still understudied and underrepresented. As of today, it is only sparsely known to the Western scholarly audience and practically unknown to academics and laymen in the Arab and Islamic world. This pertains, in particular, to the lexical resources of Soqotri, copious beyond one’s imagination and legitimately comparable to the proverbially rich vocabulary of pre-Islamic and early Islamic Classical Arabic. Producing a truly fundamental, all-encompassing account of the Soqotri vocabulary is, therefore, not only a most urgent task of today’s Semitic lexicography, but also a most challenging one.


The SLOnline project pursues two different, yet closely related goals.

On the one hand, it is conceived as a robust prolegomenon for an eventual book-format reference dictionary of Soqotri (Soqotri-English-Arabic). This publication is expected to become the basic, indispensable tool for the study of Soqotri (and Soqotra) and a new member of the family of MSA dictionaries which the scholarly community owes to T.M. Johnstone's pioneering efforts to describe the lexicons of Jibbali and Mehri


On the other hand, it will play a key role in establishing Soqotri as an efficient written medium on the Island, which will be achieved through a systematic application of the Arabic-based Soqotri writing developed and implemented by the Soqotri-Russian working team. Moreover, all Soqotri words will be translated not only into English, but also Standard Literary Arabic, which will make Soqotri’s lexical treasuries fully accessible to the Arabic-reading public in the UAE, the Gulf region more broadly, and, eventually, throughout the Arab and Islamic world. Last but not least, it is expected to give a serious impetus to comparable initiatives involving Soqotri’s sister tongue Mehri in the Yemeni Mahra province.


History of research


● The discovery by Wellstedt

For the modern western world the Soqotri language was discovered by the British navy official James R. Wellstedt who visited the island in January–March of 1835. Wellstedt compiled a 236-item word list of Soqotri, transcribed with Arabic and Roman characters and translated into Arabic and English. 


● The “Vienna corpus” and its aftermath

Wellstedt’s discovery had practically no impact on Semitic studies, and it took about 70 years for Soqotri to become an integral part of Semitic linguistics and philology. Its re-discovery is firmly associated with the name of the prominent Austrian scholar David Heinrich Müller, whose Soqotri Nachlass remains an indispensable source of evidence and inspiration for everybody dealing with the Soqotri language and oral literature up to this day. The main publications which comprise the “Vienna corpus” are as follows:


D. H. Müller. Die Mehri- und Soqoṭri-Sprache. I. Texte. Wien, 1902. – This is the smaller part of Müller’s corpus, collected from several individuals in the course of his only trip to the island in 1899.
By far the most valuable part of the 1902 volume is a small collection of very archaic poetic fragments with parallel translations in German and semi-colloquial Arabic. For many of the verses, Müller was able to obtain from his main informant brief lexical annotations, again in a peculiar, semi-colloquial form of Arabic. This part of the book is also remarkable because the poems were recorded not only in transcription, but also in Arabic script.
Another jewel of the 1902 volume is a lengthy fairy tale in the Soqotri dialect of the island ˁAbd al-Kūrī, which regrettably remains the only specimen of its kind published so far.
Of greatest value, finally, is the analytical part of the book, replete with highly penetrating and pioneering observations on the culturo-historical background of Soqotri folklore in a variety of Middle Eastern and European traditions.
The narrative part of the volume is not very extensive (ca. 80 pages) and comprises a Biblical translation (the book of Ruth) and a few fairy tales. Each piece is bilingual or trilingual with Mehri and/or Arabic (literary or colloquial) versions. The main deficiency of the 1902 volume is that Müller’s acquaintance with Soqotri was at that time still in its incipient stage. As a result, some of his transcriptions are difficult to analyze, and a few poetic samples, already rather opaque in themselves, are virtually unintelligible.


D. H. Müller. Die Mehri- und Soqoṭri-Sprache. II. Soqoṭri-Texte. Wien. – This is the largest, most important part of Müller’s Soqotri legacy. It comprises the texts recorded during a six-month working season in Vienna in 1902. The whole collection comes from one single informant, ˁAlī ˁĀmir an-Nubhānī, born in the village of Kam, not too far from Soqotra’s capital Hadibo (his dialect does not differ much from that of the Soqotri speakers from Daarho tribe, the members of the Russian-Yemeni working team of SLOnine). 
The narrative part of the 1905 volume comprises ca. 150 pages of Soqotri texts translated into German, with occasional, for most part laconic, linguistic annotations. Among the texts, there is a good selection of Old Testament translations (from Arabic) and a few other texts with an Arabic background (notably, the famous Cinderella story). The remaining texts are pure specimens of autochthonous oral lore.
Even more spectacular is the poetic part of the book, comprising ca. 750 smaller and larger compositions, most of them with brief but instructive explanations from the informant. 

D. H. Müller. Die Mehri- und Soqoṭri-Sprache. III. Šḫauri-Texte. Wien. – This publication is primarily dedicated to the texts in Jibbali. The Soqotri texts in the 1907 volume, all prosaic, are not very numerous. They are recorded during the second visit of ˁAlī ˁĀmir an-Nubhānī to Vienna in 1904.
M. Bittner. Vorstudien zur Grammatik und zum Wörterbuche der Soqoṭri-Sprache. III. Eine Soqoṭri-Version der ersten sechs Kapitel aus dem Marcus-Evangelium. Wien. – A publication of the first six chapters of the Gospel of Mark, translated into Soqotri by Müller and his informant.


Until recently, virtually all grammatical and lexical information on Soqotri came from the “Vienna corpus”, which thus lay basis not only for Müller’s and Bittner’s own grammatical and lexical studies, and also for much later research. This includes such groundbreaking works as E. Wagner’s comparative syntax of MSA (E. Wagner. Syntax der Mehri-Sprache unter Berücksichtigung auch der anderen Neusüdarabischen Sprachen. Berlin, 1953) and W. Leslau’s descriptive and comparative dictionary of Soqotri (LS, 1938).


● T. M. Johnstone’s fieldwork

Thomas M. Johnstone treated Soqotri together with other MSA in various publications. Plenty of new lexical evidence from Soqotri is scattered over Johnstone’s dictionaries of continental MSA, but has never been brought together. 


● Lexical and grammatical studies of the French research team

In the last decades of the 20th cent., A. Lonnet and M.-C. Simeone-Senelle have carried out several field trips to the island, and have made a considerable contribution to the study of Soqotri, both in the domain of grammar and of lexicon. Their grammatical observations are mostly published in the framework of summary descriptions of other Modern South Arabian languages. As far as the lexicon is concerned, important collections of new evidence pertain to the lexical fields of anatomy and kinship terminology. 


● Miranda Morris’ collections of lexical items and oral literature

An impressive contribution to the study of Soqotri language, oral tradition and culture in the recent decades has been made by the British researcher M. Morris, whose regular and lengthy fieldwork on the island brought quite a few important publications, dedicated to the botanical world of Soqotra (including the extensive work in collaboration with A. Miller, Ethnoflora of the Soqotra Archipelago. Edinburgh, 2004) and documentation of Soqotri oral lore. The most recent achievement is a two-volume edition of Soqotri poems, with detailed philological and cultural comments, prepared in collaboration with the Soqotri native speaker and an expert on the local poetic tradition Ṭānuf Sālim Nuḥ (Miranda Morris and Ṭānuf Sālim Nuḥ Di-Kišin, The Oral Art of Soqoṭra, Leiden–Boston, 2021).


● Recent progress by the Russian-Yemeni research team
Vitaly Naumkin’s first trips to the island, which took place in 1970–1980s, resulted in a number of important publications dedicated to anthropology, history, language and oral lore of Soqotra. In the framework of language description (partly in collaboration with V. Porkhomovsky), scores of previously unknown Soqotri lexemes were gathered, analyzed and published. Some of V. Naumkin’s recordings, however, awaited their publication for many years, until the recently formed Russian-Yemeni working team made them available to the academic world.

Since 2010, Naumkin, Cherkashin and Kogan have been performing a more ambitious task of systematically documenting, analyzing and interpreting the traditional heritage of Soqotra and its deeply original Semitic tongue, in close collaboration with three native speakers of Soqotri - ‘Isa Gum‘an ad-Da‘rhi, Aḥmad ‘Isa ad-Da‘rhi and Maysun Muḥammad ad-Da‘rhi. Eventually, E. Vizirova and M. Bulakh joined the Russian-Yemeni working team, which has up to now published more than 30 articles dedicated to Soqotrana, as well as two volumes of the Corpus of Soqotri Oral Literature (CSOL I–II) which appeared in Brill (Leiden) in 2014 and 2018.



SLOnline aims to be a comprehensive reference tool for Soqotri lexicography, which means that all major text publications are to be perused and incorporated into the database.
For obvious technical and methodological reasons, these sources can be subdivided into two major categories: (1) those (to be) produced by our fieldwork team, and (2) those produced by other researchers.
The first type of data consists, first of all, of the extensive Soqotri-English-Arabic glossaries of the two published volumes of the Corpus of Soqotri Oral Literature, CSOL I and CSOL II. In the near future, they will hopefully be supplemented by the glossary to the third volume (CSOL III), now in an advanced stage of preparation. It also comprises scores of articles dedicated to various aspects of Soqotrana, notably, the four issues of the “Soqotri Lexical Archive” (the 2010, 2011, 2012 and 2013 seasons), the annotated catalogues of Soqotri verbal lexemes, as well as a number of texts for various reasons published outside the CSOL volumes.
The core element of the second group of sources is the Soqotri legacy of the Austrian South Arabian Expedition, edited by David Müller (190219051907). To integrate these materials into the database means, de facto, to critically review and verify the whole of Wolf Leslau’s Lexique Soqotri: as is well known, this outstanding monument of Semitic lexicography, published in 1938, had the Vienna texts as its own exclusive corpus. Many years of intensive occupation with Müller’s texts have convinced us that these compositions are fairly well understood by modern language consultants, who read them with great pleasure and gladly comment on their lexical and grammatical peculiarities (let alone their fascinating stylistic and literary qualities). The important task of “revitalizing” this precious – and unduly neglected – text collection is thus quite feasible, but will require more than one year of thorough creative work.
Another major source stemming from outside our fieldwork team is the vocabulary of the Soqotri traditional poetry collected and translated in the recently published three-volume set by the Scottish researcher Miranda Morris, a towering figure of today’s MSA studies and a constant source of inspiration for our own work in every respect. The Soqotri lexical resources elicited by Morris are virtually borderless, and the number of “new” lexical items featuring in her texts may well amount to several hundreds. Since many of Morris’ compositions have been performed in the archaic Western dialects of Soqotri, their integration into the database will constitute a special challenge for our team, with which we nevertheless hope to cope in the course of time.
More details on the contents of SLOnline see below under Sources.
From the very onset of our work on SLOnline, it has been decided that all lexical items in the database will have to meet two fundamental criteria: to stem from published, generally accessible sources and to be verified, in form and meaning, by Soqotri native speakers. Both requirements aim at a single goal: maximum reliability of the lexical evidence.
Restriction to published data provides the user with an efficient means of control, especially in the semantic domain: since “published” usually means “published in context”, one is able, for each problematic or doubtful case, to go back to the primary source and assess it directly and independently.
Verification by native speakers means that, at least in the speech of the Da‘arho language community, a given word has a well-defined consonantal and vocalic shape, fixed not only acoustically, but also in writing. In the semantic domain, it assures an authoritative judgement of the native speakers on the basic and peripheral meanings of every lexeme under scrutiny. We take for granted that other speakers of Soqotri will be able to polish, correct and, at times, re-analyze these phonological and semantic facts.
While the second criterion (verification by native speakers) is an absolute sine qua non for the SLOnline team, the first one (presence in published sources) may at times be applied less rigidly: for various reasons, words and especially text examples elicited in the course of our recent, not-yet-published fieldwork may crop up here and there on the pages of the database, even if we do try to reduce such cases to a reasonable minimum.

Lexical sources currently processed


Catalogues of verbal lexemes appended to summary descriptions of the verbal morphology of Soqotri and text illustrations therein:


● V. Naumkin, M. Bulakh, D. Cherkashin, L. Kogan, A. Issa, I. Gumaan. Studies in the Verbal Morphology of Soqotri I/2. Strong Triconsonantal Roots in the Basic Stem (the Lexical Data). ZAL 60 (2014):35–73.

● V. Naumkin, M. Bulakh, D. Cherkashin, L. Kogan, A. Issa, I. Gumaan, M. Mohammed. Studies in the Verbal Morphology of Soqotri II: Weak and Geminated Roots in the Basic Stem. ZAL 63 (2016):19–60.

● V. Naumkin, M. Bulakh, D. Cherkashin, L. Kogan, A. Issa, I. Gumaan, M. Mohammed. Studies in the Verbal Morphology of Soqotri III/1: the Second Stem. ZAL 69 (2019):61–93.

● V. Naumkin, M. Bulakh, D. Cherkashin, L. Kogan, A. Issa, I. Gumaan, M. Mohammed. Studies in the Verbal Morphology of Soqotri III/2: a List of Sound and Weak Verbs Belonging to Second Stem in Soqotri. ZAL 70 (2019):73–91.

● M. Bulakh, L. Kogan, A. Issa, I. Gumaan, M. Mohammed. The Causative Stem in Soqotri. The Analysis. BJALL 12 (2020):260­–285.

● M. Bulakh, L. Kogan, A. Issa, I. Gumaan, M. Mohammed. The Causative Stem in Soqotri. The Data. BJALL 13 (2021):239–287.

● M. Bulakh. The Semantics and Syntax of Stem IV in Soqotri. JSS 66 (2021):263–292.


The “Soqotri Lexical Archive”:


● V. Naumkin, L. Kogan, D. Cherkashin, A. Issa, I. Gumaan. Soqotri Lexical Archive: the 2010 Fieldwork Season. ZDMG 163 (2013):61–85.

● V. Naumkin, L. Kogan, D. Cherkashin, A. Issa, I. Gumaan. Soqotri Lexical Archive: the 2011 Fieldwork Season, ZDMG 165 (2015):44–61.

● V. Naumkin, L. Kogan, D. Cherkashin, A. Issa, I. Gumaan. Soqotri Lexical Archive: the 2012 Fieldwork Season. ZDMG 166 (2016):57–80.

● V. Naumkin, L. Kogan, A. Issa, I. Gumaan. Soqotri Lexical Archive: the 2013 Fieldwork Season. ZDMG 172 (2022):253–282.


Miscellaneous text publications of the Russian-Soqotri team not included into CSOL:


● V. Naumkin, M. Bulakh, L. Kogan. Two Erotic Stories from Soqotra Revisited. Babel und Bibel 7 (2014):527–563.

● V. Naumkin, L. Kogan, I. Gumaan, A. Issa, D. Cherkashin. Soqotri Texts in the Phonogrammarchiv of the Austrian Academy of Sciences. An Annotated Edition. Oxford, 2015.

● I. Gumaan, L. Kogan, D. Cherkashin. “The Lord”: An Apology for the Muslim Faith from the Island of Soqotra. JSS 64 (2019):535–563.

● L. Kogan. Сокотрийские колыбельные. Orientalistica 3 (2020):443–456.

● В.В. Наумкин, Л.Е. Коган. Сокотрийская народная поэзия: вечное возвращение. Антропология и этнология: современный взгляд. Moscow, 2021. Pp. 522–534.



Sources to be processed in 2023–2024


● The glossary of CSOL III, scheduled for publication in 2023.

● A few forthcoming publications by the Russian-Soqotri research team, notably, M. Bulakh’s booklet on the passive-reflexive (VIII) and related stems in Soqotri.

● Lexical materials of the “Vienna Corpus” (SAE IV, VI and VII), achieved through a systematic perusal of Leslau’s Lexique Soqotri and an in-depth inquiry into the relevant passages from the texts.

● “New” lexemes scattered over The Oral Art of Soqoṭra by Miranda Morris and Ṭānuf Sālim Nuḥ Di-Kišin (Leiden–Boston, 2021).

● Botanical terminology of Ethnoflora of the Soqotra Archipelago by A. Miller and M. Morris (Edinburgh, 2004). This book, among its many merits, is a valuable source of various categories of non-botanical terminology (landscape and meteorology, foodstuffs, building and household terms, etc.). One has also to keep in mind that many (perhaps most) Soqotri plant names are derived from common nouns and adjectives, some of which are “new” words in our terminology. 

● Animal names from Fauna of the Socotra archipelago by W. Wranik and Omar Al-Saghier (Rostock, 2003).

● A wealth of anatomic terminology found in the four articles by M.-C. Simeone-Senelle and A. Lonnet. A preliminary impression is that “new” words in these studies are quite numerous.

● The Soqotri section of A. Nakano’s Comparative vocabulary of Southern Arabic.


Principles of transcription


In various publications of the Russian-Yemeni working team, several distinct systems of linguistic transcription are employed. In SLOnline, the following principles of transcription are applied (and quotations from other publications are re-transcribed in accordance with these principles).


Inventory of consonantal phonemes


Symbols in brackets denote sound restricted to Arabic borrowings.
The symbol ʸh stands for a special phoneme, an aspirated palatal approximant. In SLOnline, a distinction is drawn between this phoneme and the biphonemic combination yh (unlike CSOL I and II, where both entities are transcribed as yh.
Whenever possible, the final -ʰ is noted down in SLOnline transcription (unlike CSOL I and II and most other publications of the Russian-Yemeni team). This element appears in certain lexemes and word-forms after the final vowel, but is absent in a number of other forms (thus, a morphologically relevant opposition between -Vʰ# and -V# can be observed). 

Inventory of vocalic phonemes:
          i                     u 
            e (ö)         o
                 ɛ    (ɔ)
The symbols in brackets denote sounds which do not have the phonemic status (ö is mostly an allophone of e; ɔ features as an allophone of o, mostly in the vicinity of nasals; a mostly appears as an allophone of ɛ in the vicinity of pharyngeals or emphatics). They are nevertheless employed in the SLOnline transcription. The sound ə (mostly an allophone of e), noted down in the transcription of CSOL I, II, is marked as e in SLOnline transcription.  Besides, the transcription of SLOnline employs the diphthong ou which often emerges as an allophone of u.


Structure of the lexical entry


General information


A lexical entry can contain the following parts (the obligatory parts are marked in bold):

- Soqotri orthography (in Arabic-based script)

- The citation form of the lexeme (in linguistic transcription), accompanied with the basic inflectional forms (for nouns, with the dual and plural forms; for adjectives, with the inflectional forms for both genders, in singular, dual, and plural; for verbs, with the forms for imperfect and jussive)

- Audio file with the pronunciation of the citation form by a native speaker (mostly in fluent speech)
Basic morphological information (part of speech; for nouns, gender; for verbs, stem and conjugational type)
Translation into English
Translation into Russian
Translation into Standard Literary Arabic
References to published sources
Text examples
- Morphological notes (additional inflectional forms, such as perfect 3 sg. f. for verbs, passive forms for verbs, verbal noun forms for verbs; diminutive forms for nouns; other additional morphological information)
- Semantic notes (information on the argument structure, with references to text examples; information on additional semantic nuances, with additional text examples)
Root (abstract consonantal root, if it can be established with certainty; abstract base of the non-verbal lexeme, if no consonantal root can be established)
- Other derivatives from the same root
- Visual materials (photo and/or video illustrations)


Audio materials


Characteristically for the MSA languages, the consonantism (and especially the vocalism) of Soqotri features an intricate inventory of phonetic and phonemic items. In spite of being quite remarkable from both the cross-linguistic and comparative Semitic angles, most of these features are still poorly known outside the narrow circle of MSA specialists. In such a context, it would be unwise to miss the technical opportunities offered by such a (relatively) novel lexicographic tool as an online vocabulary database.

Ideally, each and every lexical item featured in SLOnline is expected to be illustrated by a relatively clean and representative soundtrack, produced by one of our Soqotri language consultants. Сlearly enough, this ambitious goal, involving several thousands of lexemes, cannot be achieved at once, especially since the contacts between the two parts of the team remain sparse and the relatively rare fieldwork seasons are densely occupied by other, more pressing tasks. As a temporary, makeshift solution we have opted for cutting out the necessary elements from the sound files already at our disposal, viz. the audio versions of the texts (to be) published in CSOL I–III. In our current impression, these tracks give quite a fair account of Soqotri phonetics, but it is our earnest hope that during the coming years we will be able to replace them with higher quality audio materials.


Visual materials


Since 2012, Ekaterina Vizirova has been thoroughly administrating the photographic archive of the Russian-Yemeni linguistic and folkloristic mission on Soqotra. The archive includes hundreds of images illustrating the natural, economic and cultural realia of the island. A good part of these photos, published in the appendices to CSOL I–II, is already well known to the users of the Corpus. The third collection, pertaining to CSOL III, is now in an advanced stage of preparation. Today, these materials are being gradually integrated into SLOnline. In 2022, when the database project was approved, Vizirova was invited to launch a new fieldwork program, more exactly corresponding to the needs and goals of the Lexicon.

In addition to the extensive photographic documentation, the database makes available to the public a few video illustrations, typically intended to clarify the meanings of verbal lexemes designating non-momentaneous actions related to traditional technological processes. At present, the video archive of SLOnline consists of just a few specimens, but will be seriously enlarged in the course of future fieldwork trips to Soqotra.




While primarily synchronical in its scope, the database will be sensitive to etymological issues, in an attempt to attract the attention of comparative Semitists to the invaluable lexical treasuries of Soqotri. It may also be true, incidentally, that Semitic etymology can at times be helpful in eliciting the semantic nuances of rare and obscure Soqotri words and especially the diachronic path of their semantic evolution.

As is well known, many Soqotri lexemes – including some of the key elements of the basic vocabulary – are etymologically opaque. This state of affairs (which our team has been often unable to improve) is faithfully reflected in the lexical entries of SLOnline: whenever a given entry is obscure from the historical point of view, it is explicitly marked as “no etymology detected” or similar. Elsewhere, the etymological section of the entry will feature a more or less extensive amount of comparative information of various degrees of reliability.

Most typically, three categories of etymological data – often complementing rather than excluding each other – are involved: continental MSA; other Semitic; Arabic dialects of the South of the Arabian peninsula.

The database will continue the systematic inquiry into the common lexical stock of (Proto-)MSA initiated in Chapter 8 of Kogan’s Genealogical Classification of Semitic (2015). A special section in the etymological domain of SLOnline is dedicated exclusively to cognates from the continental sister tongues of Soqotri. Moreover, as far as Mehri is concerned, the data will not be limited to Johnstone’s Mehri Lexicon, but will encompass all major publications so far without systematic lexicographic assessment – from the pioneering SAE volumes by W. Hein, D. Müller and A. Jahn up to the recent publications by A. Sima and J. Watson.

For better known common Semitic roots, the user of the database will normally be redirected to its companion tool “Semitic Etymological Database Online” (, created and maintained by Arkhipov, Kogan and Bulakh. More delicate cases (which, for Soqotri, are expected to form a vast majority) are assessed individually with no strict technical rules of presentation.

Constant attention to Arabic loanwords in Soqotri is among the hallmarks of our project, which presupposes a systematic perusal of the extant vocabularies of the dialects of Southern Arabia. One has to regret that many of them (notably, Adeni, Hadrami and Dhofari) are so imperfectly (if at all) described from the lexicographic point of view. Still, the groundbreaking efforts of such outstanding Arabists as P. Behnstedt, M. Piamenta and. M. al-Iryani were by no means fruitless, which makes a sea difference between our present diachronic understanding of the Arabic borrowings in Soqotri (and MSA in general) and the relatively modest results, in this domain, of our esteemed predecessors (first and foremost, Wolf Leslau’s Lexique).


User's information


You can find words in the dictionary using either the list of words (in consonantal-alphabetical order), or the list of word roots, or the search engine on the front page. The search is possible by the word's transcription, root, meaning (in English, Arabic or Russian), and the references and notes of the respective entry. 
The site is searchable by regular expressions. For example, ^ can be used for indicating the starting position within the string, \b can be used for indicating word boundaries, and . (a point) can be used as the wildcard. 
Recommended Web browsers (especially for the support of diacritics and Arabic writing): Google Chrome, Apple Safari.


Terms of use

Both textual and multimedia data in the site is freely available for non-commercial use, on the condition that reference is made to Soqotri Lexicon online and its URL ( Commercial use is prohibited without prior written permission from the project's head. The project is ongoing, and the lexicon is being completed and updated continuously. The project participants bear no legal responsibility for the accuracy and completeness of the data.