[mew-int 01607] Re: windows 1252

Kazu Yamamoto ( 山本和彦 ) kazu at example.com
Mon Nov 10 16:11:23 JST 2003


Hello Handa-san,

Thank you for your explanation.

> (2) ctext (alias of compound-text)
> 
> On conversion, it works not fully compatible with the
> specification of X Compound Text because it encodes any
> Emacs characters while using an designation sequence for
> private character sets (please note that all Emacs charasets
> have a iso-final-char).  So, Big5 characters are preceded by
> ESC $ ( 0 or 1, mule-unicode-0100-24ff characters are
> preceded by ESC - 1.
              ^^^^^^^

Let me clarify. 

Q1) It seemes to me that Emacs encodes mule-unicode-0100-24ff with ESC
$ - 1. But the explanation above says ESC - 1. Which one is correct as
Emacs's spec?

Q2) I don't think it's not good idea to disclose the internal
representation "mule-unicode-0100-24ff" into a file. According to the
spec of ctext provided with XFree86, it has extension for UTF-8:

---
7.  The UTF-8 encoding

Unicode  characters  that  are  not  contained in one of the
approved standard encodings can be encoded using  the  UTF-8
encoding. The following escape sequences are used:

     01/11 02/05 04/07   switch into UTF-8 mode
     01/11 02/05 04/00   return from UTF-8 mode

The  first  is  the  ISO registered sequence for UTF-8 (ISO-
IR-196), the second  is  the  ISO-2022  ``standard  return''
sequence.  While  in UTF-8 mode, the UTF-8 encoding replaces
the currently designated GL and GR encodings.  After  return
from  UTF-8 mode, the previously designated GL and GR encod-
ings are reactivated.
---

How about using this to encode mule-unicode-0100-24ff?

> When it runs under emacs-unicode version, on writing the
> file, if all the characters can be encoded by ctext, keep
> using it.  If not (because, in emacs-unicode, some character
> doesn't belong to any charset that has iso-final-char), use
> utf-8.  And in both cases, add a coding tag.  On reading,
> check the coding tag at first.  If no coding tag, read by
> ctext, otherwise, read by the coding system specified in the
> tag.

I remember that, some years ago, Handa-san said to me, "The current
Emacs is using mule-unicode but will migrate to Unicode".  But I don't
know what exactly emacs-unicode refers to. Which versions? Or
a different source tree?

--Kazu



More information about the Mew-int mailing list