An alphabet is a complete standardized set of lettersbasic written symbolseach of which roughly represents a phoneme of a spoken language, either as it exists now or as it may have been in the past. There are other systems of writing such as logograms, in which each symbol represents a morpheme, and syllabaries, in which each symbol represents a syllable.

The word "alphabet" itself comes from alpha and beta, the first two symbols of the Greek alphabet. There are dozens of alphabets in use today. Most of them are 'linear', which means that they are made up of lines. Notable exceptions are the Braille alphabet, Morse Code and the cuneiform alphabet of the ancient city of Ugarit.


History of the Alphabet

Wadi el-Hol 19th c. BC Proto-Canaanite 14th c. BC

Armenian 405
Georgian 5th c.
Orkhon 6th c.
Ogham 6th c.
Hangul 1446


Among segmental scripts (that is, scripts that use a separate glyph for each phoneme, commonly called "alphabets"), one may distinguish abjads, which only record consonants and were first developed by the Egyptians as part of their hieroglyphic script; true alphabets which record consonants and vowels separately, first developed by the Greeks; and abugidas, in which the vowels are indicated by diacritical marks or systematic modification of the form of the consonants, first developed by the Indians. Examples of present-day abjads are the Arabic and Hebrew scripts; true alphabets include Latin, Cyrillic, and Korean Hangul; and abugidas are used to write Ethiopic, Hindi, and Thai. The Canadian Aboriginal Syllabics are also an abugida, as a glyph stands for a consonant and is rotated to represent the vowel, rather than each consonant-vowel combination being represented by a separate glyph, as in a true syllabary.

The boundaries between these three types is not always clear-cut. For example, Iraqi Kurdish is written in the Arabic script, which is normally an abjad. However, in Kurdish, writing the vowels is mandatory, and full letters are used, so the script is a true alphabet. Other languages may use a Semitic abjad with mandatory vowel diacritics, effectively making it an abugida. On the other hand, the Phagspa script of the Mongol Empire was based closely on the Tibetan abugida, but all vowel marks were written after the preceding consonant rather than as diacritic marks. Although short a is not written, as in the abugidas, one could argue that the linear arrangement makes this a true alphabet. Conversely, the vowel marks of the Ethiopic abugida have been so completely assimilated into their consonants that the system is learned as a syllabary rather than as a segmental script. Even more extreme, the Pahlavi abjad became logographic. (See below.)

Thus the primary classification of alphabets reflects how they treat vowels. Further classification can be based on tone, though there are as yet no names to distinguish the various types. Some alphabets simply disregard tone entirely, especially when it does not carry a heavy functional load, as in parts of Africa and the Americas. Such scripts are to tone what abjads are to vowels. Most commonly, tones are indicated with diacritics, the way vowels are treated in abugidas. This is the case for Vietnamese (a true alphabet) and Thai (an abugida). In Thai, tone is determined primarily by the choice of consonant, with diacritics for disambiguation. In the Pollard script (an abugida), vowels are indicated by diacritics, but the placement of the vowel relative to the consonant indicates the tone. More rarely, a script will have separate letters for the tones, as is the case for Hmong and Zhuang. Regardless of whether letters or diacritics are used, the most common tone will not be marked, just as the most common vowel is not marked in Indic abugidas.

Alphabets can be quite small. The Book Pahlavi script, an abjad, had only 12 letters. Theoretically, only 11 letters would be required for a true alphabet to write Pirah or Rotokas. However, these languages are not actually written. The Scandinavian Futhark was one of the smallest true alphabets in actual use, with just 16 letters. (The Haelsinge Runes are only known to have 15 letters, but they are poorly attested.) The Hawaiian alphabet is sometimes claimed to be smaller, but it actually consists of 18 letters, including the okina and five long vowels. However, other Polynesian alphabets may be as short as is claimed for Hawaiian: Tahitian has one more consonant than standard Hawaiian, but the long vowels may optionally be written by reduplication (aa for , etc.). With this orthography, the Tahitian alphabet has just 14 letters. While Polynesian languages have small alphabets because they have few phonemes to represent, Book Pahlavi was small because many letters had been conflated (the graphic distinctions had been lost over time), and diacritics are not used to compensate. For example, a comma-shaped letter represented g, d, y, k, and j. However, such simplifications can perversely make a script more complicated. In later Pahlavi papyri, half of the remaining distinctions were lost, and the script could no longer be read as a sequence of letters at all, but had to be learned as word symbols that is, as logograms like Egyptian demotic.

The largest segmental script is probably an abugida, Devanagari. Vedic Sanskrit is written in an alphabet of 53 letters, including the visarga mark for final aspiration and special letters for k and j, though one of the long els is theoretical and not actually used. The Hindi alphabet must represent both Sanskrit and modern vocabulary, and so has been expanded to 58 with the khutma letters (a dot added to a letter to represent sounds from Persian and English).

The largest known abjad is Sindhi, with 51 letters. The largest true alphabets include Kabardian and Abxaz (for Cyrillic), with 58 and 56 letters, repectively, and Slovak (for Roman), with 46. However, these scripts either include di- and tri-graphs (similar to Spanish ch), or diacritics (like Slovak  ). The largest true alphabet where each letter is graphically independent is probably Georgian, with 40. The Georgian alphabet is supposed to have been extended to 52 letters to write Aghbanian.

Syllabaries typically include 50 to 400 glyphs (though Pirah would require only 24 if tone were not indicated, and Rotokas 30), and the glyphs of logographic systems number from the hundreds to thousands. Thus a simple count of the number of distinct symbols is an important clue to deciphering an unknown script.

It is not always clear what constitutes a different alphabet. French uses the same basic alphabet as English, but many of the letters can carry diacritical accents and other marks (for example, , or ). In French, these accents are not considered to create additional letters. However, in Icelandic, the accented letters (such as , and ) are considered wholly distinct. Some alphabets are augmented with ligatures such as character (common in Latin writing), by letters borrowed from other writing systems, such as the thorn found in Old English and Icelandic and borrowed from the Futhark script, and by totally new symbols, such as the eth, , also used in Old English and Icelandic, and the letter Ou, ", used in the Algonquin language.

Italian requires only a subset of the Latin alphabet to write all of its own words, but certain letters (such as K, X and W) are retained for the purpose of writing "foreign" words.


Main article: Spelling

Each language may establish certain general rules that govern the association between letters and phonemes, but, depending on the language, these rules may or may not be consistently followed. In a perfectly phonological alphabet, the phonemes and letters would correspond perfectly in two directions: a writer could predict the spelling of a word given its pronunciation, and a speaker could predict the pronunciation of a word given its spelling. However, languages often evolve independently of their writing systems, and writing systems have been borrowed for languages they were not designed for, so the degree to which letters of an alphabet correspond to phonemes of a language varies greatly from one language to another and even within a single language.

Languages may fail to achieve a one-to-one correspondence between letters and sounds in any of several ways:

  • A language may represent a given phoneme with a combination of letters rather than just a single letter. Two-letter combinations are called digraphs and three-letter groups are called trigraphs. Kabardian uses a tesseragraph (four letters) for one of its phonemes.
  • A language may represent the same phoneme with two different letters or combinations of letters.
  • A language may spell some words with unpronounced letters that exist for historical or other reasons.
  • Pronunciation of individual words may change according to the presence of surrounding words in a sentence.
  • Different dialects of a language may pronounce different phonemes for the same word.
  • A language may use different sets of symbols or different rules for distinct sets of vocabulary items (such as the Japanese hiragana and katakana syllabaries, or the various rules in English for spelling words from Latin and Greek, or the original Germanic vocabulary.

National languages generally elect to address the problem of dialects by simply associating the alphabet with the national standard. However, with international languages with wide variations in its dialects, such as English, it would be impossible to represent the language in all its variations with a single phonetic alphabet.

Some national languages like Finnish have a very regular spelling system with close to a one-to-one correspondence between letters and phonemes. The Italian language has no verb corresponding to 'spell'; scriversi ('is written') suffices, because a correct pronunciation exactly corresponds to a correct orthography. In standard Spanish, it is possible to predict the pronunciation of a word from its spelling, but not vice versa; this is because certain phonemes can be represented in more than one way, but a given letter is consistently represented. French, with its silent letters and its heavy use of nasal vowels and elision, may seem to lack much correspondence between spelling and pronunciation, but its rules on pronunciation are actually consistent and predictable with a fair degree of accuracy. At the other extreme, however, are languages such as English, where the spelling of many words simply has to be memorized as they do not correspond to sounds in a consistent way, because the Great Vowel Shift in English occurred after orthography was established, and because English has acquired a large number of loanwords at different times retaining their original spelling at varying levels. However, even English has general rules that predict pronunciation from spelling, and these rules are successful most of the time.

The sounds of speech of all languages of the world can be written by a rather small universal phonetic alphabet. A standard for this is the International Phonetic Alphabet.


Main article: Collation

An alphabet also serves to establish an order among letters that can be used for sorting entries in lists, called collating. Note that the order does not have to be constant among different languages using this alphabet; for examples see Latin alphabet: Collating in other languages.

In recent years the Unicode initiative has attempted to collate most of the world's known writing systems into a single character encoding. As well as its primary purpose of standardising computer processing of non-Roman scripts, the Unicode project has provided a focus for script-related scholarship.

History and diffusion

The oldest known alphabet consists of recently discovered graffiti, scratched onto rocks in central Egypt around 1800 BC. It appears to have been used by Semitic workers or mercenaries partially integrated into Egyptian society. The alphabet had previously been thought to have originated some 300 years later. (See Middle Bronze Age alphabets.)

The Egyptians aleady had an alphabet as part of their hieroglyphic script, but only used purely alphabetic writing when transcribing loan words or foreign names. The inventors of the Semitic alphabet, whether Semitic workers or Egyptian bureaucrats, appear to have taken Egyptian hieroglyphs (and not just the Egyptian alphabet) and given them translated Semitic names. So, for example, pr "house" became bayt "house". At this point scholars are still debating whether, when these glyphs were used to write the Semitic language instead of Egyptian, they were purely alphabetic, or whether, for example, the "house" glyph stood for both the consonant b and the sequence byt, as it had stood for both p and pr in Egyptian. However, by the time it was inherited by the Canaanites, it was purely alphabetic, standing only for b (see Phoenician alphabet).

All subsequent alphabets around the world have either descended from this first Semitic alphabet, or else been inspired by one of its descendants, with the possible exception of Meroitic, a seemingly independent 3rd century BC alphabetic adaptation of hieroglyphs. The one modern-day national alphabet that cannot be traced to the Canaanite alphabet graphically is the Maldivian script, which is unique in that, although clearly modeled after existing alphabets such as Arabic, it derives its letters from numerals! (However, there is some speculation that the ancestral Brahmi numerals above 3 might ultimately derive from the Semitic alphabet as well.)

Among alphabets that aren't used as national scripts today, a few are clearly independent of other alphabets in their letter forms: the Zhuyin phonetic alphabet derives from Chinese characters, and the geometric Cree Syllabics (which, despite its name, is an abugida) is derived from British shorthand. The Santali alphabet, an indigenous true alphabet of India, appears to be based on traditional symbols such as "danger" and "meeting place", as well as pictographs invented by its creator. (The names of the Santali letters are related to the sound they represent through the acrophonic principle, but it is the final consonant or vowel that the letter represents: e.g. le "swelling" represents e, while en "thresh grain" represents n.) In the ancient world, Ogham consisted of tally marks, and the monumental inscriptions of the Old Persian Empire were written in an essentially alphabetic cuneiform script whose letter forms seem to have been created for the occasion. All five of these appear to be graphically independent of the other alphabets of the world, but they were devised from their example.

Changes to a new medium sometimes caused a break in graphical form, or made the relationship difficult to trace. It is not immediately obvious that the cuneiform Ugaritic alphabet derives from a prototypical Semitic abjad, for example. Although manual alphabets are a direct continuation of the local alphabet (both the British two-handed and the French/American one-handed alphabets retain the forms of the Latin alphabet, as the Indian manual alphabet does Devanagari, and the Korean does Hangul), Braille, semaphore, maritime signal flags, and the Morse codes are essentially arbitrary geometric forms. The shapes of the Braille and semaphore letters, for example, are derived from the alphabetic order of the Latin alphabet, but not from the letters themselves. Modern shorthand also appears to be graphically unrelated. If it derives from the Latin alphabet historically, the connection has been lost.

However, most alphabets descend directly from the original Semitic script. The Aramaic alphabet, which evolved from Phoenician in the 7th century BC and was used by the Persian Empire, appears to be the ancestor of nearly all of the modern alphabets of Asia. The modern Hebrew alphabet started out as a local variant of Aramaic. (The original Hebrew alphabet has been retained by the Samaritans.) The Arabic alphabet descended from Aramaic via the Nabatean alphabet of what is now southern Jordan. The Syriac alphabet used after the 3rd century CE evolved, through Pahlavi and Sogdian, into the alphabets of northern Asia, such as Orkhon (probably), Uyghur, Mongolian, and Manchu. The Georgian alphabet is of uncertain provenance, but appears to be part of the Persian-Aramaic family.

The Aramaic alphabet is also the most likely ancestor of the Brahmic alphabets of India, which spread to Tibet, Southeast Asia, and Indonesia with the Hindu and Buddhist religions. China and Japan, while absorbing Buddhism, maintained their own logographic and syllabic scripts. However, the Hangul alphabet invented in Korea in the 15th century is based on half a dozen letters apparently derived from Tibetan via the imperial Phagspa alphabet of the Yuan dynasty in China.

Besides Aramaic, the Phoenician alphabet gave rise to the Berber and Greek alphabets. Whereas separate letters for vowels would have actually hindered the legibility of Egyptian, Berber, or Semitic, their absence was problematic for Greek, which had a very different morphological structure. However, there was a simple solution. The alphabet was based on the acrophonic principle, where a letter represented the first sound of its name. Thus bayt (Greek beta) stood for b. All of the names of the letters of the Phoenician alphabet started with consonants. However, several of these were rather soft, and unpronounceable by the Greeks, who simply ignored them. For example, the Greeks had no glottal stop or h, so the Phoenician letters alep and het became Greek alpha and eta. By the acrophonic principle, these now stood for the vowels a and e rather than the consonants ” and h. As this didn't provide for all twelve vowels, the Greeks created digraphs and diacritics, such as ei, ou, o (which became omega), or simply ignored the deficit (long a, i, u).

Greek is in turn the source for all the modern scripts of Europe. Eastern Greek, where the letter eta stood for a vowel, gave rise to Cyrillic and probably Armenian; the western dialects of Greek, where eta remained an h, produced the Roman alphabet and even the runes.

Although this description presents the evolution of scripts in a linear fashion, this is a simplification. For example, the Manchu alphabet, descended from the abjads of West Asia, was also influenced by Korean hangul, which was either independent (the traditional view) or derived from the abugidas of South Asia. The Greek alphabet, itself ultimately a derivative of the hieroglyphs, more directly adopted half a dozen Demotic hieroglyphs when it was used to write Coptic.

The most popular alphabet in use today is the 26-letter Latin alphabet used, with some modification, for most of the languages of the European Union, the Americas, Subsaharan Africa, and the islands of the Pacific Ocean: English, Spanish, Portuguese, Indonesian, French, Turkish, German, Javanese, Vietnamese, Italian, Polish, Hausa, Swahili, Filipino, etc. In modern usage, the term Latin alphabet is used for any straight-forward derivation of the alphabet used by the Romans. These variants may drop letters (Hawaiian) or add letters (Czech) to or from the classical Roman script, and of course many letter shapes have changed over the centuries such as the lower-case letters you're reading now, which the Romans would not have recognized.

The default Latin alphabet is the Roman, supplemented with J, V, W, and lower-case variants:

A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z

Additional letters may be formed as ligatures (such as ash , oethel , eszett , engma K, ou ", , ), by diacritics (,  , r),as digraphs (2, Ll), by modification (, eth , &, yogh , schwa ), or may even be borrowed from another alphabet (thorn , wynn ).

However, these glyphs are not necessarily considered independent letters of the alphabet, depending on the language. For instance, in English is considered a graphic variant of ae rather than a separate letter, while in Danish and Norwegian it is a true letter, and is placed at the end of the alphabet along with and aa/.

See also


