and Japanese encodings on Mac OS X


In the traditional Japanese encodings, the code-point used in ASCII for the backslash was used for the Yen symbol (¥). With the advent of Unicode, this can create a bit of a mess for Japanese users, and resolving this issue requires some attention of developers.

Yuji Tachikawa, Maarten Sneep

Editor’s note: This text is largely based on an email message, which (hopefully) explains the informal tone of this text.

As you know, one of the extremely often used character in TeX is the backslash \. However, if you buy a textbook on TeX in Japan, you’ll likely to see it using the yen symbol ¥ instead of the backslash \. This has a historical origin, from the time when there was no wide character support. Japanese standardizing committee, who saw the need to include the currency symbol in the 8-bit character code, modified the original ASCII table a little and placed yen at the code point for the backslash, as the \ is used relatively infrequently — as long as you don’t do

Hence, all the major three encodings, EUC code used in Unices, SJIS code in MS products, and JIS code used in the Internet before unicode became widespread, used yen (¥) at the codepoint where others placed the backslash (\).

As the TeX implementation just sees the ASCII code for characters, we are used to use ¥documentclass instead of \documentclass, etc. (A related example is the MS-style path for files. c:\Program Files\ becomes c:¥Program Files¥ Korean used the won symbol ₩ in the code point for \. So they use ₩documentclass.) If this were the situation in Mac OS X, we would be in a much better position!

In Classic Mac OS before OS X, the preferred Japanese encoding was MacJapanese. It is a slight modification of SJIS, in that which has the yen symbol in the ASCII code for the backslash, and has the backslash symbol in a separate position. So in the classic days, we used ¥documentclass and it worked great out of the box. What changed the day was the introduction of Mac OS X and use of Unicode. At first I wondered why the incorporation of Unicode made the things worse, but things turned out as follows:

In OS X, most of the string related stuff was done using NSString or CFString. It is theoretically contains UTF characters in all cases. For speed and efficiency however, it is a class cluster with classes for “ASCII”-like strings, “really UTF” strings, and so on. When you use NSString’s -writeToFile:atomically:, such “ASCII”-like strings are automatically transformed to UTF and written down to file. So far so good. The point is that the encoding for the “ASCII”-like strings is dependent on the runtime configuration in the International Pane in the System Preferences. ASCII is used in English environment, and MacJapanese is used in Japanese environment. Additonally, hard-coded strings in the app binary is interpreted in those default “ASCII”-like encoding.

The end result is this: when your snippet editor contains @"\documentclass" or something in the source file, it is understood by the OS X runtime in Japanese environment as @"¥documentclass" in Unicode, and when it is written down to a temporary file to be processed by , it contains the character ¥ in Unicode. This obviously causes to fail when you run it later on.

As most Cocoa apps are Unicode aware, if you input backslash in the text field it is correctly reflected in the temporary .tex file. The complication here is that, many keyboards in Japan do not have the backslash, they only have the Yen symbol, and you need to press Option-Yen to input the backslash. This is quite cumbersome, especially if you are writing a mathematical expression [1]. In Tiger, the standard Japanese input method, Kotoeri, finally has the option of inputting backslash just by pressing the Yen key. Yen can be input with Opt-Yen.

And many Japanese switchers to Macs do not know these Yen-and-Backslash things, and as I said most textbooks on and here tell us to use Yen-symbol, never mentioning the backslash. Thus we need to support yen-to-backslash translator in the text field. You may have noticed that I made a NSValueTransformer to facilitate this in Type. Type is released under the BSD license, and the source can be found on its home-page. Yuji Tachikawa can be reached through the Type homepage.

There are other implementations around, notably in Shop, released as GPL.