[mew-int 00685] problem while fetching mail

Fri Jan 11 16:59:38 JST 2002

The attached two mails can't be fetched from my
/var/spool/mail/sx0005 file, pressing the `i' button -- I get

  No new messages (2 messages left)

If I remove the first mail, everything works OK.  Looks like a bug in
one of mew's binaries...

This is not the first time I get this behaviour, so it isn't a new
bug.  Interestingly, it seems to be dependent on the file's time stamp
(but this is just my assumption).  While copying around the file for
testing purposes, I was suddenly able to fetch both mails.  Then, I
copied it again, touched it, and I got the buggy behaviour.

Hope that helps.

    Werner
-------------- next part --------------
>From linux-utf8-bounce at example.com  Fri Jan 11 01:48:35 2002
Return-Path: <linux-utf8-bounce at example.com>
Received: from localhost (localhost [127.0.0.1])
	by orion.local (8.11.6/8.11.6/SuSE Linux 0.5) with ESMTP id g0B0mZg28061
	for <sx0005 at example.com>; Fri, 11 Jan 2002 01:48:35 +0100
Envelope-to: 1509-242 at example.com
Delivery-date: Thu, 10 Jan 2002 10:35:03 +0100
Received: from pop.onlinehome.de
	by localhost with POP3 (fetchmail-5.9.0)
	for sx0005 at example.com (single-drop); Fri, 11 Jan 2002 01:48:35 +0100 (CET)
Received: from [199.232.76.164] (helo=fencepost.gnu.org)
	by mxng03.kundenserver.de with esmtp (Exim 3.22 #2)
	id 16Obbp-0003Uz-00
	for 1509-242 at example.com; Thu, 10 Jan 2002 10:34:53 +0100
Received: from humbolt.nl.linux.org ([131.211.28.48])
	by fencepost.gnu.org with esmtp (Exim 3.33 #1 (Debian))
	id 16Obbo-0004XO-00
	for <wl at example.com>; Thu, 10 Jan 2002 04:34:53 -0500
Received: from localhost.nl.linux.org ([IPv6:::ffff:127.0.0.1]:48534 "EHLO
	humbolt.") by humbolt.nl.linux.org with ESMTP id <S16137AbSAJJep>;
	Thu, 10 Jan 2002 10:34:45 +0100
Received: with LISTAR (v1.0.0; list linux-utf8); Thu, 10 Jan 2002 10:34:39 +0100 (CET)
Received: from mta7.pltn13.pbi.net ([IPv6:::ffff:64.164.98.8]:29432 "EHLO
	mta7.pltn13.pbi.net") by humbolt.nl.linux.org with ESMTP
	id <S16051AbSAJJeR>; Thu, 10 Jan 2002 10:34:17 +0100
Received: from there ([216.102.199.245])
 by mta7.pltn13.pbi.net (iPlanet Messaging Server 5.1 (built May  7 2001))
 with SMTP id <0GPP008F4U5J1H at example.com> for linux-utf8 at example.com;
 Thu, 10 Jan 2002 01:24:56 -0800 (PST)
Date: 	Thu, 10 Jan 2002 01:24:55 -0800
From: Edward Cherlin <cherlin at example.com>
Subject: Re: Unicode, character ambiguities
In-reply-to: <a1gimg$88p$1 at example.com>
To: linux-utf8 at example.com
Message-id: <0GPP008F5U5J1H at example.com>
Organization: Web for Humans
MIME-version: 1.0
X-Mailer: KMail [version 1.3.1]
Content-type: text/plain; charset=utf-8
Content-transfer-encoding: 7BIT
References: <522121065.1010551480675.JavaMail.root at example.com>
 <a1gimg$88p$1 at example.com>
Original-Recipient: rfc822;linux-utf8 at example.com
X-listar-version: Listar v1.0.0
Sender: linux-utf8-bounce at example.com
Errors-to: linux-utf8-bounce at example.com
X-original-sender: cherlin at example.com
Precedence: bulk
Reply-to: linux-utf8 at example.com
List-help: <mailto:listar at example.com?Subject=help>
List-unsubscribe: <mailto:linux-utf8-request at example.com?Subject=unsubscribe>
List-software: Listar version 1.0.0
X-List-ID: <linux-utf8.nl.linux.org>
List-subscribe: 	<mailto:linux-utf8-request at example.com?Subject=subscribe>
List-owner: <mailto:riel at example.com>
List-post: <mailto:linux-utf8 at example.com>
List-archive: <http://mail.nl.linux.org/linux-utf8/>
X-list: 	linux-utf8
X-IMAPbase: 1010710858 74
Status: O
X-Status: 
X-Keywords:                      
X-UID: 1

On Tuesday 08 January 2002 08:58 pm, you wrote:
> Followup to: 
> <522121065.1010551480675.JavaMail.root at example.com> By
> author:    starner at example.com
> In newsgroup: linux.utf8
>
> > > Character Set Encoding of Tags:
> > > ===============================
> > >
> > > UTF-8 is the default encoding for tag data.  Unfortunately
> > > UTF-8 muffed it for Asian languages by doing the equivalent of
> > > giving the same character codes to English, Russian, and Greek
> > > letters.

I have followed this debate for several years. I think the Japanese 
experts in the Ideographic Rapporteur Group did a wonderful job on 
Han Unification. Even the Japanese anti-unification stalwarts admit 
that 

the characters they find troublesome would not have been considered 
difficult by anyone educated before 1950,

there are only a small number of characters of concern, primarily 
U+76F4 (Mathews 1004, Nelson 775) and characters containing it, and

the problem only arises when multilingual plain text is displayed in 
an inappropriate font. 

Since the author can control the font in formatted text, and the user 
can control the font in viewing plain text, I don't see the problem. 
(To which opponents of unification reply, "_That_'s the problem", so 
we don't get anywhere.)

On the other hand, I have never heard anyone other than a 
mathematician complain about the unification of Fraktur and other 
writing styles for the Latin alphabet, even though Fraktur is 
extremely difficult for Americans and even younger Germans to read. 
Fraktur will evidently be disunified in Unicode 4.0 for use in 
variable names in math, but _not_ for text.

> > It's interesting that Japanese and Chinese, which are unrelated
> > languages, are sometimes mutually understandable when written,
> > but somehow use totally different scripts.

Almost all Japanese characters came from China, although there are a 
few indigenous creations such as the character for mountain pass 
(touge) and a number of simplified characters that appeared first in 
Japan. Kana is of course a purely Japanese invention.

> Also, there would hardly have been any hurt feelings if U+0065
> (Latin), U+0391 (Greek), and 0+0410 (Cyrillic) had been unified. 
> It would just not have saved enough code points to bother.

Actually, it was impossible because of the source separation rule 
applied to Chinese, Korean, and Japanese encodings, including Big 
Five, GB2312, KSC, and JIS. (How ironic.) These standards include 
various combinations of Latin, Greek, and Cyrillic alphabets in 
separate code blocks alongside Hanzi, Zhuyin, Hangul, and kana. So 
LATIN CAPITAL LETTER A, CYRILLIC CAPITAL LETTER A, and GREEK CAPITAL 
LETTER ALPHA cannot be unified without breaking round-trip conversion 
for these standards. 

On the other hand, Kurdish Cyrillic Q has been unified with Latin, 
since there was no pre-existing character set standard containing 
both at separate code points. There is, for example, no 
KOI-8-K(urdish).

> As far as "English" and "Russian" are concerned, the various
> Latin-script and the various Cyrillic-script languages have been
> unified for a long, long time.

Do you mean each group separate from the other? Otherwise I can't 
make sense of this.

> Things like U+212A and U+212B should never have been allowed to
> happen, on the other hand, IMNSHO.

(KELVIN SIGN and ANGSTROM SIGN)

Source separation rule, again. We're stuck with them in the standard, 
but we don't have to use them, or any of the other Compatibility 
Characters and Presentation Forms. Anyway, the true worst case was 
the encoding of more than 11,000 Hangul syllables, none of which is 
required.

> 	-hpa

Better an imperfect but entirely usable standard than no standard.

-- 
Edward Cherlin
edward at example.com
Does your Web site work?
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

>From linux-utf8-bounce at example.com  Fri Jan 11 01:48:37 2002
Return-Path: <linux-utf8-bounce at example.com>
Received: from localhost (localhost [127.0.0.1])
	by orion.local (8.11.6/8.11.6/SuSE Linux 0.5) with ESMTP id g0B0mbg28064
	for <sx0005 at example.com>; Fri, 11 Jan 2002 01:48:37 +0100
Envelope-to: 1509-242 at example.com
Delivery-date: Thu, 10 Jan 2002 10:43:53 +0100
Received: from pop.onlinehome.de
	by localhost with POP3 (fetchmail-5.9.0)
	for sx0005 at example.com (single-drop); Fri, 11 Jan 2002 01:48:37 +0100 (CET)
Received: from [199.232.76.164] (helo=fencepost.gnu.org)
	by mxng04.kundenserver.de with esmtp (Exim 3.22 #2)
	id 16ObkW-0000St-00
	for 1509-242 at example.com; Thu, 10 Jan 2002 10:43:53 +0100
Received: from humbolt.nl.linux.org ([131.211.28.48])
	by fencepost.gnu.org with esmtp (Exim 3.33 #1 (Debian))
	id 16ObkW-0004sW-00
	for <wl at example.com>; Thu, 10 Jan 2002 04:43:52 -0500
Received: from localhost.nl.linux.org ([IPv6:::ffff:127.0.0.1]:8088 "EHLO
	humbolt.") by humbolt.nl.linux.org with ESMTP id <S16232AbSAJJno>;
	Thu, 10 Jan 2002 10:43:44 +0100
Received: with LISTAR (v1.0.0; list linux-utf8); Thu, 10 Jan 2002 10:43:40 +0100 (CET)
Received: from h00805f1531c2.ne.mediaone.net ([IPv6:::ffff:24.91.104.252]:57608
	"EHLO h0040333b7dc3.ne.mediaone.net") by humbolt.nl.linux.org
	with ESMTP id <S16051AbSAJJnd>; Thu, 10 Jan 2002 10:43:33 +0100
Received: by h0040333b7dc3.ne.mediaone.net (Postfix, from userid 1000)
	id 94A6D10C92EB6; Thu, 10 Jan 2002 04:46:42 -0500 (EST)
Date: 	Thu, 10 Jan 2002 04:46:42 -0500
From: Glenn Maynard <g_lutf8 at example.com>
To: linux-utf8 at example.com
Subject: Re: Unicode, character ambiguities
Message-ID: <20020110094642.GA5255 at example.com>
Mail-Followup-To: linux-utf8 at example.com
References: <20020109040335.GA16945 at example.com> <20020109124331.A14678 at example.com> <200201091228.g09CSV821391 at example.com> <200201100556.g0A5usb18767 at example.com> <a1jh8h$scr$1 at example.com> <200201100915.g0A9F0j23627 at example.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200201100915.g0A9F0j23627 at example.com>
User-Agent: Mutt/1.3.25i
Mail-Copies-To: 	nobody
Original-Recipient: rfc822;linux-utf8 at example.com
X-listar-version: Listar v1.0.0
Sender: linux-utf8-bounce at example.com
Errors-to: linux-utf8-bounce at example.com
X-original-sender: g_lutf8 at example.com
Precedence: bulk
Reply-to: linux-utf8 at example.com
List-help: <mailto:listar at example.com?Subject=help>
List-unsubscribe: <mailto:linux-utf8-request at example.com?Subject=unsubscribe>
List-software: Listar version 1.0.0
X-List-ID: <linux-utf8.nl.linux.org>
List-subscribe: 	<mailto:linux-utf8-request at example.com?Subject=subscribe>
List-owner: <mailto:riel at example.com>
List-post: <mailto:linux-utf8 at example.com>
List-archive: <http://mail.nl.linux.org/linux-utf8/>
X-list: 	linux-utf8
Status: O
X-Status: 
X-Keywords:                  
X-UID: 2

> Saying about round-trip compatibility, yes, round-trip compatibility
> for EUC-JP, EUC-KR, Big5, GB2312, GBK are guaranteed, i.e., Unicode
> is a superset of these encodings (character sets).  However,
> (1) there are no authorative mapping tables between these encodings
>     and Unicode and there are various private mapping tables.  This
>     can cause portability problem around round-trap compatibility.

On Thu, Jan 10, 2002 at 06:18:10PM +0900, Tomohiro KUBOTA wrote:
> For reference of glyph, I am using 
> http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=f9b1
> and so on.  Otherwise, displayed glyphs depend on system and
> we cannot discuss about same glyph.

By the way, I notice this page contains "mappings to major standards"
including Big5 and GB 2312.  Also, JIS, and I assume there's established
mappings between JIS and EUC-JP, and probably for the other languages,
too.

"The mappings to major standards have been exhaustively proofed and are
a normative part of Unicode." (http://www.unicode.org/charts/unihan.html)

"A normative part of Unicode" sounds very authoritative; am I missing
something?

-- 
Glenn Maynard
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/