Hebrew Cantillation Marks And Their Encoding

by Helmut Richter


Table of Contents


V. Unicode Problems


1. Error pertaining to the characters U+0598 and U+05AE

Since the inclusion of cantillation marks in Unicode version 2.0 (1996), there is confusion about the characters U+0598 HEBREW ACCENT ZARQA and U+05AE HEBREW ACCENT ZINOR.

All sources other than code tables, that is, various grammars of Biblical Hebrew and Breuer's book on cantillation marks (Mordekhai Breuer, Taamey hammiqra be-21 sefarim uvsifrey eme"t; Jerusalem, TShM"B (=1981)), which I took as ultimate referee, agree on the following:

In contrast to these findings, Unicode (here following Israeli national standard SI 1311-2) makes a distinction between ZARQA and ZINOR (sic!) where ZINOR seems to play the role of Tsinorit, as the much more similar names suggest. This interpretation, to wit that ZINOR should have been TSINORIT, is also supported by the order of the accents: first all distinctive accents in decreasing strength, then the conjunctive accents; in each class first the accents for the 21 books (or for all books), then for the 3 books. From this order, one sees that U+0598 was intended to be a distinctive accent of medium strength in the 21 books - exactly what Zarqa is. However, the glyph chart shows the two characters swapped, and the combining classes (whose impact on normalisation is minimal in this particular case) are in accordance with the glyph chart and not with the above interpretation of the character names.

After the problem was discovered, Unicode decided to fix it by making the minimum possible change to the standard. The characters remain swapped, and the misleading character names remain as well. From Unicode version 4.0 (2003) onward, U+0598 with the name ZARQA denotes such a mark on top of a letter, usually a Tsinorit (or else a Tsinor or Zarqa in an unusual position), and U+05AE with the name ZINOR denotes such a mark above left of the word, i.e. a Tsinor or Zarqa. Note in particular that ZARQA is normally not the right Unicode character to denote a Zarqa. The logical order of the marks, first distinctive then conjunctive marks, has an exception here.


2. Order of characters between Holam and Vav

In the case that a vowel is represented in vocalised text by both a vowel point and a consonant (a mater lectionis), older versions of Unicode failed to define the order of these two characters. In nearly all cases, this order is evident from the typographical appearance which is the same as if one of the consonants had no vowel point. In the case of the combination of Holam and Vav (Holam Male), however, there is a need to define the intended sequence. While many users find the sequence Vav-Holam more natural, some encoding schemes used Holam-Vav in analogy to other pairs, e.g. Hiriq-Yod, in which the mater lectionis follows the vowel point.

Beginning with Unicode version 5.0 (2006), there is an explanation about the possible typographic difference between a Holam Male and a Vav followed by a Holam Haser, and the way this difference can, but need not, be represented in Unicode. Now, one can conclude that the Unicode representation of Holam Male is first Vav then Holam, otherwise this explanation would not make sense.


3. Unclear glyphs

The pictures in the glyphs in the glyph chart are unclear for the following accents. Here is what they should look like:

Unicode has decided not to modify the glyphs in the charts.


© Helmut Richter      published 2000-12-06; last update 2014-09-04