Using Arabic script for Swahili

The spelling conventions suggested here for current-day Swahili reflect those developed by Marehemu Mu'allim Sheikh Yahya Ali Omar, and used in various manuscripts he wrote, along with the principles set out in the academic article he wrote in collaboration with PJL Frankl [1]. This section sets out the main elements of my interpretation of these source, giving various examples. I would be happy to hear from anyone who has any comments on the conventions.

[1] Omar, YA in collaboration with Frankl PJL (1997): "An historical review of the Arabic rendering of Swahili, together with proposals for the development of a Swahili writing system in Arabic script." Journal of the Royal Asiatic Society, Series 3, 7, 1: 55-71.

  • Word segmentation is as for current-day Swahili in Roman script. This means that items such as لَ زَ يَ نَ na, ya, za, la are written separately from the following word, even though in manuscripts they may be written attached to that word.
  • All short vowels are marked. Although short vowels are usually omitted in Arabic, this is inadvisable in Swahili because of the different structure of the language, and also because Swahili has 5 vowels instead of 3.
  • The penultimate syllable of a word has its stress/length marked by writing it with a long vowel. ا is used for a, ي for e and i, and و for o and u. (The short vowels a, i, u may be omitted when they occur before a long vowel, eg ساسَ instead of سَاسَ (sasa, now), but this is not recommended.) This also helps to delimit individual words in the Arabic script.
  • Initial vowels use the vowel-carriers أ (a, o, u) or إ (e, i), eg أَنَسٖيمَanasema, he is speaking﴿, أُڠَالِ (ugali, porridge), إِذِينِ (idhini, permission).
  • The order of typing is: vowel carrier, then short vowel, then long vowel (if applicable).
  • Word-final vowel sequences [1] are written using vowel-carriers. ئ is used when the first vowel of the word-final sequence is e or i: كُپٗكٖئَ (kupokea, to receive), كُتِئَ (kutia, to place). ؤ is used when the first vowel of the word-final sequence is o or u: كُپٗؤَ (kupoa, to cool), كُسُڠُؤَ (kusugua, to rub). Note that in these instances, there is no need to add a long vowel for the penultimate syllable - thus كُپٖئَ (kupea, to sweep), and not كُپٖيئَ, and كُتٗؤَ (kutoa, to produce), and not كُتٗوؤَ. When the first vowel of the word-final sequence is a, ء (hamza) preceded by ا is used to carry the vowel: مَفَاءَ (mafaa, usefulness), تَاءِ (tai, vulture), بَاءٗ (bao, plank)
  • Word-internal vowel sequences [1] use ئ as the vowel-carrier when the second vowel of the word-internal sequence is e or i: شَئِيرِ (shairi, poetry), كِئِينِ (kiini, pith), كُئِيتَ (kuita, to call). When the second vowel of the word-internal sequence is o or u, ؤ is the vowel-carrier: شَؤُورِ (shauri, advice), مٖؤُوپٖ (meupe, white [class 6]), كُؤٗونَ (kuona, to see). Where a is the second vowel of the word-internal sequence, the vowel-carrier is ئ if the first vowel is e, i, eg ڤِئَازِ (viazi, potatoes), and ؤ if the first vowel is o, u, eg كُؤَنْدِيكَ (kuandika, to write). In the sequence aa, the vowel-carrier أ is used: مَأَنْدِيشِ (maandishi, manuscripts). Note that in these instances, in contrast to word-final vowel sequences, a long vowel is written for the penultimate syllable where appropriate.
  • Arabic letters in loanwords should ideally use the original Arabic letter, but where that is not appropriate they can be written as an Arabic transliteration of the Roman letter, eg ذ instead of ض or ظ. Note that the Roman to Arabic converter will always do this, since the standard Swahili Roman orthography does not preserve these distinctions.

[1] The marking of vowel sequences using hamza is handled in just one page of the Omar/Frankl paper (Appendix B: The Hamza in Swahili Arabic script). I have expanded on the principles set out there in an attempt to remove any ambiguities.

Swahili name Keystrokes Arabic script Roman script / Description Example
AltGr+a followed by a أَ a - initial أَسٗومَ asoma (he reads)
fataha a َ a - non-initial, non-penultimate بَهَرِينِ baharini (in the sea)
a followed by Shift+a ا َ a - penultimate سَاسَ sasa (now)
\ (backslash) followed by e إٖ e - initial إٖندٖلٖئَ endelea (go on!)
kasiri ya kusimama e ٖ e - non-initial, non-penultimate كٖلٖيلٖ kelele (shout)
e followed by Shift+e ي ٖ e - penultimate نجٖيمَ njema (good)
\ (backslash) followed by i إِ i - initial إِسِپٗكُوَ isipokuwa (unless)
kasiri i ِ i - non-initial, non-penultimate كِتَابُ kitabu (book)
i followed by Shift+i ي ِ i - penultimate مَشِيزِ mashizi (soot)
AltGr+a followed by o أٗ o - initial أٗكتٗوبَ Oktoba (Oktober)
dhuma ya kupindua o ٗ o - non-initial, non-penultimate كِلِيمٗ kilimo (cultivation)
o followed by Shift+o و ٗ o - penultimate مْكٗونڠَ mkonga (elephant's trunk)
AltGr+a followed by u أُ u - initial أُلِيمِ ulimi (tongue)
dhuma u ُ u - non-initial, non-penultimate كُشُكُورُ kushukuru ( to give thanks)
u followed by Shift+u و ُ u - penultimate كُومِ kumi (ten)
Swahili name Keystrokes Arabic script Roman script / Description Example
bee b ب b كِبُورِ kiburi (arrogance)
pee p پ p كُپَاكَ kupaka (to paint)
tee t ت t - dental t فِتِينَ fitina (intrigue)
t followed by h ته t - aspirated dental t (Mombasa) تهُوپَ t'upa (bottle)
AltGr+Shift+t ٹ t - alveolar t (Mombasa) ٹُنڈُ tundu (chicken coop)
thee Shift+t ث th ثَمَنِينِ thamanini (eighty)
jimu j ج j جَانَ jana (yesterday)
chimu c چ ch چُونڠوَ chungwa (large orange)
c followed by h چه ch - aspirated ch (Mombasa) چهُونڠوَ ch'ungwa (medium-sized orange)
hhee Shift+h ح h حَسَن Hasan (Hasan [name])
khee x خ h / kh خَبَارِ [k]habari (news)
dali d د d كُدَنڠَانيَ kudanganya (to deceive)
AltGr+Shift+d ڈ d - alveolar d (Mombasa) ٹُنڈُ tundu (chicken coop)
dhali Shift+d ذ dh ذَهَابُ dhahabu (gold)
ree r ر r كُرُودِ kurudi (to come back)
zee z ز z كُزِيمَ kuzima (to extinguish)
Shift+z ژ zh (Northern) ژِينَ zhina (name)
sini s س s كُسِمَامَ kusimama (to stand)
shini Shift+s ش sh كُشِيكَ kushika (to hold)
sadi AltGr+s ص s صَحِيبُ sahibu (friend)
dhadi AltGr+d ض dh ضِيكِ dhiki (distress)
tee AltGr+t ط t كُطَهِرِيشَ kutahirisha (to purify)
zee AltGr+z ظ dh أَظُهُورِ adhuhuri (noon)
ayni ' (single-quote) ع ' مَعَانَ ma'ana (meaning)
ghayni Shift+g غ gh غَضَابُ ghadhabu (anger)
gayni g ڠ g ڠُنِئَ gunia (sack)
ng'ayni n followed by Shift+n نݝ ng' نݝٗومبٖ ng'ombe (cattle)
fee f ف f فِيڠٗ figo (kidneys)
vee v ڤ v كُڤِيمبَ kuvimba (to swell)
qafu q ق q وَقْفُ waqfu (consecrated)
kafu k ك k كُوكُ kuku (large hen)
k followed by h كه k - aspirated k (Mombasa) كهُوكُ k'uku (medium-sized hen)
lamu l ل l كُلِيمَ kulima (to dig)
mimu m م m - non-syllabic مِيمِ mimi (I)
nuni n ن n نَانِ nani (who?)
hee h ه h هَاكٗ hako (he is not here)
wau w و w كُوَ kuwa (to be)
wau AltGr+Shift+w ۏ w - labio-dental w (Mombasa) ۏِينٗ wino (ink)
yee y ي y يَاكٗ yako (your)
hamza AltGr+Shift+h ء (vowel-carrier) تَاءٗ tao (arch)
hamza Shift+ , (comma) ٔ (marks long vowels used
as vowel-carriers)
كُپِكِئَ kupikia (to cook for)
sakani Shift+ . (full stop) ْ (marks a consonant without
a following vowel)
أَسْلَارِ askari (soldier)
shada Shift+ ' (single-quote) ّ (marks a doubled consonant
in Arabic words)
وَالنَّهَارِ wa-nnahari(and day)

This section sets out a comparison between the systems in Sheikh Yahya's manuscripts, the Omar/Frankl paper, and Andika!. The comparison is summarised in the table below.

Feature Manuscripts Paper Andika!
Sakani is marked on long vowels
All short vowels are marked
Sakani on consonants denotes syllabicity only
Distinction between syllabicity and prenasalisation
Sakani on long vowels

In Sheikh Yahya's manuscripts, ي و carry a sakani when used to mark length/stress in the penultimate syllable, eg مَزِيْوَ (maziwa, milk). However, in the Omar/Frankl article, sakani is not used here ﴾eg مَزيوَ). The suggested spelling in Andika! reflects this (though there is nothing to stop users marking sakani if they wish, and it is possible to choose this as an option in the Roman to Arabic converter).

Marking short vowels

In Sheikh Yahya's manuscripts, all short vowels are marked, but the Omar/Frankl paper proposed that marking these is unnecessary in certain situations:

  • If the short (unstressed, non-penultimate) vowel they represent is identical to a preceding short vowel. For example, in ثَمنين (thamanini, eighty) the second a is omitted because it is preceded by an a (fataha).
  • If the short vowel they represent is identical to a preceding or following long/stressed (penultimate) vowel represented by ي و ا. For example, in ثَمنين (thamanini, eighty) the last i is omitted because it is preceded by ي, and in ذهابُ (dhahabu, gold) the first a is omitted because it is followed by ا.
  • Where all the vowels in a word are identical, except for length/stress. For example: تپكاز (tapakaza, scatter), فكير (fikiri, think), شكور (shukuru, give thanks).

However, the suggested spelling convention in Andika!, as in Sheikh Yahya's own manuscripts, is that all short vowels are marked, thus: شُكُورُ - فِكِيرِ - تَپَكَازَ - ذَهَابُ - ثَمَنِينِ. There are a few practical reasons for this:

  • Short e, o need to be marked in nearly all cases anyway, since the Arabic script has no way otherwise of distinguishing ي meaning i from ي meaning e, or و meaning o from و meaning u.
  • Omitting short vowel marks may conceivably save time when writing, once the rules above are mastered, but this is unlikely to apply when typing - it is probably faster simply to type more or less what would be typed when using the Roman script, including short vowels.
  • The omission of short vowels means that transliteration into Roman script would require post-editing to add vowels. Even if automating the application of the above rules to avoid this were possible, it is likely that the resulting system would be cumbersome.

Sakani on consonants

Arabic sukun marks the absence of a vowel after a consonant. In Sheikh Yahya's manuscripts, sakani is used consistently for this purpose (alongside its use on long vowels). Thus: أُنَڤْيٗوٖيزَ (unavyoweza, how you can), كْوَ (kwa, to, by, for). Its most common occurrence is on a nasal before another consonant: أِنْڠَوَ (ingawa, although), نْجٖيمَ (njema, good).

Its use on nasals means that sakani can also denote syllabicity, and in the Omar/Frankl paper, its function appears to be limited solely to that. The aim again was most likely to limit the number of diacritics in the text. The suggested convention in Andika!, however, is to follow the manuscript practice, and use sakani on the first consonant of multi-consonant clusters. In case some users feel that this leads to clutter, an option is added in the Roman to Arabic converter to turn it off.

Distinction between syllabicity and prenasalisation

Although the Roman orthography does not distinguish these two sounds, both Sheikh Yahya's manuscripts and the Omar/Frankl paper make a distinction between a syllabic nasal followed by a voiced plosive (eg m̩b) and a prenasalised voiced plosive (eg nɓ). The former is written with a preceding ْم, and the latter with a preceding ن, as in مْبَيَ (mbaya, bad [Class1]) compared to نبَايَ (mbaya, bad [Class 9]).

Andika! will of course allow this distinction to be made in the Arabic script should a writer wish to do so. However, the Roman to Arabic converter cannot do this (since the distinction is not reflected in the standard orthography), and will always convert mb to مْب, so automatically-converted text will need post-editing to reflect this distinction.