[Mew-dist 1608] one request to RFC 2047

1997年 8月 24日 (日) 21:03:06 JST

Hello Keith,

I tried to find you to discuss this issue in Munich but failed. The
following is one request to RFC2047 which I sent you before and I have 
not received a reply to, I think. I'm very sorry that I didn't find
this problem when RFC2047 was ID.

RFC 2047 encoding for fields defined as *text, such as Subject:, has a
problem, I believe.

To my understanding, RFC2047 says:

(1) An 'encoded-word' that appears in a header field defined as
'*text' MUST be separated from any adjacent 'encoded-word' or 'text'
by 'linear-white-space'.

I believe this means that 
	Subject: English =?iso-2022-jp?B?encoded-Japanese?=
is illegal and we must(or should) use
	Subject: English
	 =?iso-2022-jp?B?encoded-Japanese?=
instead.

RFC2048 also says:

(2) When displaying a particular header field that contains multiple
'encoded-word's, any 'linear-white-space' that separates a pair of
adjacent 'encoded-word's is ignored. However, this rule doesn't apply
the case between *text and 'encoded-word'.

Thus,
	Subject: English
	 =?iso-2022-jp?B?encoded-Japanese?=
is decoded as
	Subject: English Japanese.
Note that one space remains between English and Japanese in this
case.

The problem is how to encode Subject: if English and Japanese is
continuous. An example is as follows:

	Subject: EnglishJapanese

Only solution I found is
	Subject: =?us-ascii?Q?English?=
		=?J?B?encoded-Japanese?=.
(Note that
	Subject: English=?J?B?encoded-Japanese?=
is not allowed by rule (1). )

However, this is discouraged by the third rule defined RFC 2047:

(3) Use of 'encoded-word's to represent strings of purely ASCII
characters is allowed, but discouraged.

Some mail utilities modify Subject:. A typical example is the case
where name of a mailing list is prepended to Subject:.

	Subject: [ML name] text

If such utilities just append to ML name to Japanese only Subject:, it 
is a violation of RFC2047:

	Subject: [ML name]=?J?B?encoded-Japanese?=

A reply message may contain the following Subject:

	Subject: =?Q?US-ASCII?[ML_name]?=
	 =?J?B?encoded-Japanese?=

Since Subject: doesn't start with [ML name], they append unnecessary
strings again:

	Subject: [ML name]=?Q?US-ASCII?[ML_name]?=
	 =?J?B?encoded-Japanese?=

<Proposed resolution>

I would propose to eliminate rule (1). The intention of rule (1) is to
allow RFC2047 decoders to notice 'encoded-word' easily. However, to my
implementation experience, this rule makes it much difficult to
implement an encoder. Without rule (1), decoders can find
'encoded-word, for instance, with help of regular expression.

Harald said to me at Munich that application area is planning to
revise RFC2047 so that it can contain language tag. I hope that the
proposed resolution will be included in the next spec.

--Kazu