The Persian Heh+Hamzeh

This combo is unknown in Arabic but is quite a hot topic in Persian!

How do you write /khāneh-ye man/ = "my house"?

How the /he + ye/ sounds of the /ezāfeh/ construction are represented in writing:

#1

…commonly left unmarked in writing (even if pronounced in speaking)

#2

…as what looks like a “hamzeh” over the “heh” (preferred in works of literature and generally by folks of the "old school")

#3

…as an isolated big “yeh” in line with the rest of the word (preferred by the younger generation in Iran, #2 having been purged from the standardized national school textbooks in favor of #3)

 

How do you type example #2?

As One character with One keystroke:

Please see below the "heh+hamzeh" (in red) in the non-standard Nazanin & Kamran fonts and what happens if you change what you have typed to Times New Roman & Tahoma (unicode standard fonts)

Would you believe that in the 4 examples in the picture to the left, the exact same characters were typed?! The only thing that changed was the FONT.

This one-stroke character (typed in the picture in red) is actually the Arabic character called "tah marbuta" (U+0629) or 1577 (dec). If you haven't re-mapped  your keyboard, this character can be typed by hitting Shift + z  (For WinXP, see both Shift-z and Shift-g)

I have only shown the Times New Roman and Tahoma examples because I want you to know that if you type a document using this character in a font like Nazanin or Kamran and do anything else with the document other than printing it out on paper or making a picture of it in a graphics application, you risk it looking like the "tah marbuta" rather than "heh+hamzeh" which is what will happen if your document ends up on a computer without the Nazanin, Kamran, etc fonts  installed.

This situation has come about  because some desperate people back in the '70s wanted to type Persian before Windows was ready and so found a way to modify the Arabic "tah marbuta". (Where there is a will, there is a way!)

 

As a two-keystroke combination:

Please see below the Heh+Hamzeh (in red) typed in a proper unicode font the way it should be to be scientifically compliant. While it's nice to be scientific and comply with computer standards, as you can see, it doesn't look too nice. Microsoft is planning  to update the Tahoma and hopefully this will get fixed.

In the picture to the left, the "heh" is one character and the "hamzeh" is a different character.  The "hamzeh" is called "Arabic Hamza Above" (U+0654) and 1620 (dec).

(The Nazanin, Kamran, etc (non-standard) as well as Times New Roman (standard) fonts lack this character)

 

In the meantime, if you are totally desperate to type this as two keystrokes in a unicode-compliant font, you can do this: (Note that I'm hiding this info down at the bottom lest it catch on!)

Tahoma:  خانه‌ء من

Times New Roman: خانه‌ء من

The above was typed with heh + Zero-Width-Non-Joiner (U+200C)+ Arabic Letter Hamza (U+0621)

 

For historic interest: U+06C0

The one-keystroke and one-character U+06C0  ۀ  has been deprecated from the Persian subset of the Unicode Standard because it decomposes to <06D5 + 0654> which is actually <AE + HAMZA ABOVE> instead of the correct <0647 + 0654> (for HEH and HAMZA ABOVE).

"ae"  (U+06D5) ە is an Arabic character and is not used in Persian even though to the naked eye, it looks just like the Heh.

Although the difference is not noticeable to  humans, machine readers will notice the difference and get into all kinds of trouble. For example, they won't be able to Search/Find the characters correctly.

It is, however, perfectly acceptable to type the Heh+Hamza Above as a one-keystroke and two character combination (0647 +  0654 = HEH and HAMZA ABOVE) on the keyboard.  If you notice when you type the one-keystroke and four-character Rial sign, you will have to backspace four times in order to get rid of the same thing you typed as one keystroke.

 Although there may be interest in providing an easy one-key function key for multiple character combinations (such as the common Mim+Yeh or Lam+Alif) , the case of the Heh+Hamza Above is special in that it is falling into neglect among the younger generations and is therefore facing extinction.  Therefore it is highly recommended it be given its own key, just so that everyone (especially those publishing scholarly texts) recognizes it as a Persian ligature.

If you want the full story on the technical details, here it is  from Roozbeh Pournader:

That certain machine which contains some Unicode compliant software,
may decide to apply Unicode Normalization Forms converters to its input
data, which as Unicode says, is a completely Unicode-compliant thing to
do, and as W3C says, is *required* in some cases. These Normalization
Forms are specified in the Unicode Standard Annex UAX#15, at:

http://www.unicode.org/reports/tr15/

(All Unicode-compliant applications are asked then to treat any
normalization form of each certain string the same way.)

Now, if you have a string that contains the string U+06C0, and then one
converts it to the Normalization Form C, it will become the two-character
sequence <AE, HAMZA ABOVE>. That certain sequence is specified in the
Unicode data files, for example the one at:

http://www.unicode.org/Public/UNIDATA/UnicodeData.txt

which contains a line like:

06C0;ARABIC LETTER HEH WITH YEH ABOVE;Lo;0;AL;06D5 0654;;;;N;ARABIC LETTER
HAMZAH ON HA;;;;

That line mentions "06D5 0654" which is of course <AE, HAMZA ABOVE>.

 

BACK TO PERSIAN WORD-PROCESSING