Hebrew Cantillation Marks And Their Encoding

by Helmut Richter


Table of Contents


III. Character Codes


Cantillation Marks In Modern Character Codes

Here, the representation of the cantillation marks in two character codes is given:

When it comes to the question how cantillation marks are represented in these character codes, we need to make an important distinction:

In particular, if marks have the same shape and position but different meanings, such as Tipcha, Tarkha and Meayla, they are considered the same mark but distinct symbols.

Now, a character coding can adhere to one of the following strategies:

  1. Codes are assigned to the marks, irrespective of their meaning and their combination to symbols. This includes marks that cannot occur other than in the combination with another mark to form a symbol.

  2. Codes are assigned to the symbols. The corresponding marks must then be derived during the rendering process.

  3. Codes are assigned to the marks, but dependent on what symbols they are components of.

Each of these ways to proceed has its benefits and drawbacks. Concentrating on marks facilitates reading and writing (i.e. the transformation between written or printed text and its coded representation) but complicates processing the content of the coded text. Violations of the usage standard for symbols can much more easily dealt with if only the marks are coded, thus making it possible to encode texts even if the marks do not always combine to meaningful symbols: in this spirit, the Michigan-Claremont manual insists "Code what is written, not what is meant."

The two codings presented in the table both follow the first of the strategies, with the Michigan-Claremont code making a few distinctions in the spirit of strategy 3. Therefore, there is sometimes more than one MC code value that corresponds to the same Unicode value.

As both these codes are based on marks as distinct from symbols, a new private code for the symbols and their constituents was developed for the purpose of this exposition. Without such a code, it would have been very difficult to maintain tables of symbols. Such a code can also serve as a basis for privately used characters in a publicly standardised code when a distinction of marks according to strategy 3 above is needed. This private code is explained in the next section together with the abbreviations used in the syntax chapters.

A Systematic Code Reflecting The Semantics Of The Symbols

All publicly standardised character codes for cantillation marks covered in this article are codes for marks irrespective of their combination to symbols and irrespective of the semantics of the symbols. In contrast to that, the syntax description showed that the same mark can be part of different symbols (e.g. Legarmeh (=Paseq) as part of Shalshelet Gadol, of Mahpakh Legarmeh and of several others), and the same symbol can have different semantic significance (e.g. Revia as king or as duke in the 3 books, or Mahpakh Legarmeh with three different possible ranks). When these differences are important, one needs a code reflecting them.

The code proposed below has been developed in order to have a sorting criterion for the tables in this article. It can, however, also be used in texts containing cantillation marks if a finer distinction of the marks is needed than the one by the mere shape and position of the marks, for instance, when a program is written to distinguish the different possible semantics of each given mark. It can be used together with Unicode if its range is embedded into the private use area, e.g. at U+E100 to U+E1FF.

The design principles of this code are:

21 books 3 books
distinctive conj. distinctive conj.
00  SoP0 20  Rvi2     60  Mun 80  SoP0 A0  RvG2     E0  Mun
       
        44  Paz3     84  OYr1 A4  RvQ2 C4  Paz3 E4  AtH
46  QaP3 66  Glg 86  AzL1 E6  Glg
08  Atn0     48  TlG3 68  Mer 88  Atn1 A8  Dhi2     E8  Mer
    8A  Paz1 AA  MpL2 EA  MrM
            6C  TlQ 8C  Rvi1         EC  ShQ
6E  May EE  Tar
10  Sgl1 30  Zar2 50  Ger3 70  Qad     B0  Tsi2 D0  AzL3 F0  Qad
12  Sha1 52  Grm3     D2  MpL3 F2  Ill
14  ZqQ1 34  Psh2     74  Mhp 94  RvM1         F4  Mhp
16  ZqG1 36  Ytv2     F6  MpM
18  Tip1 38  Tvr2 58  Lgm3 78  MeK 98  ShG1            
7A  Dar    
            7C  Mf 9C  MpL1         FC  Mf
5E  Pq 7E  Mg DE  Pq FE  Mg

In the code chart above, the same abbreviations for the symbols have been used as in the syntax charts. What the abbreviations stand for is listed in the tables. There, similar abbreviations will be defined for the marks as well. The design principles for these abbreviations are:

Here is a summary of how the ranks of the symbols are denoted by the numbers in the abbreviations and by the colours in the syntax charts and in the tables:

Abbr.rank of symbol
xxx0final emperor
xxx0non-final emperor
xxx1non-final king
xxx1final king
xxx1king after Atnach (3 books only)
xxx2non-final duke
xxx2final duke
xxx3non-final officer
xxx3final officer
xxxservant = conjunctive symbol
xxother character

Legend For The Tables

SC = systematic code

a private code for both cantillation symbols and cantillation marks based on their semantics.

Rationale and explanations see above.

Abbr. = Abbreviation

abbreviation used in the code charts of the syntax description.

Both the digit in the abbreviation and the background colour indicate the rank of the symbol. Detailed explanations see above.

Position

the position of the mark(s) relative to the text

The position of a mark consists of two features: its place in the word and its position relative to the letter. For the former, the following codes are used:

codes for marks and for symbols consisting of only one mark:
a mark prepositive to the word, carries no information about stress
b mark on unstressed syllable
c mark at the initial consonant of stressed syllable
d mark postpositive to the word, indicates stress on ultimate syllable unless other mark present
e mark postpositive to the word, carries no information about stress
code combinations for symbols consisting of two marks:
ac marks a and c, both mandatory
a(c) mark a prepositive to the word with no information about stress; infrequently extra mark c at stressed syllable
bc marks b and c, both mandatory, marks mostly on the same word but mark b sometimes on the preceding word
(b)c marks b and c, marks mostly on the same word but mark b sometimes on preceding word or missing
(c)d mark d if stress on ultimate syllable, otherwise two marks c and d
ce marks c and e, both mandatory
(c)e mark e postpositive to the word with no information about stress; infrequently extra mark c at stressed syllable

In addition, the codes in the mark table contain an indication of the position of the mark relative to the letter where it is placed: a and b for above and below, optionally with l or r for left and right, and f for final (a mark that is placed after the word like a spacing character).

The information given by these codes is supplemented by a symbol in the next column. There, the space occupied by the entire word is depicted as grey area, so that a mark at the right or left of this area denotes a mark that is prepositive (or postpositive, resp.) to the entire word. If the symbol consists of two marks, their positions are shown in red and pale blue as follows:

  • In the cantillation mark table, the red spot shows the position of the mark at hand, and the pale blue spot shows the position of the mark with which the mark at hand is combined to form a symbol.

  • In the cantillation symbol tables, the red spot shows the position of the primary mark (typically the mark placed on the consonant of the stressed syllable).

Shape

the shape of the marks without indication where they are positioned relative to the text

If there are two marks, they are read from right to left. For instance, if the position code is ac, the mark in position a is shown right of the mark in position c.

Name

the name of the symbol or mark

  • In the cantillation mark table, the name of the mark is given if the mark has a name of its own; otherwise the name of the symbol is given. This can lead to a situation like, for instance, the following: The symbol Sof Pasuq consists of two marks; one of them, Silluq, has a name of its own, the other one has not and is given as Sof Pasuq. In any case, only one name for a mark is given, even if it has also other names.

  • In the cantillation symbol tables, one column of the table contains one or more synonyms or different Latin-script spellings of the symbol's name; another column contains one of the names (not necessarily the first in the other column) in Hebrew script.

MC

code value in the Michigan-Claremont encoding

Note that this is not a decimal or hexadecimal number but a string consisting of two decimal digits.

Unicode

code value of the mark in Unicode

Unicode name

name of the mark in Unicode


© Helmut Richter      published 1999-08-30; last update 2001-04-09