[mew-int 2934] Re: Identify urls

Diogo F. S. Ramos diogofsr at example.com
Sat Nov 6 03:40:34 JST 2010


> On Fri, 5 Nov 2010 13:07:30 -0200, Diogo F.S.Ramos wrote:
>> 
>> > I am not sure allowing closing braces inside URLs is the way to go.
>> > Sure some URLs contain braces but these are usually balanced.  I
>> > personally use the following regex to allow "depth 1" matching braces.
>> 
>> You have a valid point, although I don't see the disadvantage of
>> allowing closing braces inside URLs, but I guess it should correctly
>> identify URLs like `(http://www.example.com/foo(bar))baz' as
>> `http://www.example.com/foo(bar)'.
> 
> The problem with “(http://www.example.com/foo(bar))baz” is that it is
> ambiguous.  Did the user want to say that “http://www.example.com/foo(bar)”
> is the URL and forgot the space after the closing brace?

Yes, sure it is ambiguous. This probably is the main difficult -- OK,
too much broad statement -- to correctly identify them. I think the
best we could do is create a regex that would look for common
combinations at mail text, as (url) or <url>.

> The only case of URLs with braces I have seen are from Microsoft
> (MSDN) and these have matching braces.  I have never seen URLs with
> two braces like “http://www.example.com/foo(bar)(zzz)” or like
>http://www.example.com/foo(b(a)r)”.

Hum, I can't recall if I ever seem those either but I sure saw with
single opening and closing parentheses, which was actually the reason
for my initial investigation.

But, as I said, I liked you solution about opening and closing
parentheses, but didn't mine work too?
 
>> Unfortunately I tried your solution with `re-builder' but it always
>> stop recognizing the URL at a closing parentheses. Could you verify if
>> it is working there for you?
> 
> You are correct; here is a better (although not yet perfect) version:
> 
> (setq mew-regex-url
>   (let ((u "[^  	\n>()\"`'“”]*"))
>     (concat "\\b\\(\\(\\(file\\|news\\|mailto\\):\\)"
> 	    "\\|\\(\\(s?https?\\|ftp\\|gopher\\|telnet\\|wais\\)://\\)\\)"
> 	    "\\((" u ")\\|" u "[^  	\n>()\"`'“”.,:]\\)+")))

I couldn't eval your concat statement with C-xC-e but I guess I got
the idea.

-- 
Diogo F. S. Ramos


More information about the Mew-int mailing list