[mew-int 2934] Re: Identify urls
Diogo F. S. Ramos
diogofsr at example.com
Sat Nov 6 03:40:34 JST 2010
> On Fri, 5 Nov 2010 13:07:30 -0200, Diogo F.S.Ramos wrote:
>>
>> > I am not sure allowing closing braces inside URLs is the way to go.
>> > Sure some URLs contain braces but these are usually balanced. I
>> > personally use the following regex to allow "depth 1" matching braces.
>>
>> You have a valid point, although I don't see the disadvantage of
>> allowing closing braces inside URLs, but I guess it should correctly
>> identify URLs like `(http://www.example.com/foo(bar))baz' as
>> `http://www.example.com/foo(bar)'.
>
> The problem with “(http://www.example.com/foo(bar))baz” is that it is
> ambiguous. Did the user want to say that “http://www.example.com/foo(bar)”
> is the URL and forgot the space after the closing brace?
Yes, sure it is ambiguous. This probably is the main difficult -- OK,
too much broad statement -- to correctly identify them. I think the
best we could do is create a regex that would look for common
combinations at mail text, as (url) or <url>.
> The only case of URLs with braces I have seen are from Microsoft
> (MSDN) and these have matching braces. I have never seen URLs with
> two braces like “http://www.example.com/foo(bar)(zzz)” or like
> “http://www.example.com/foo(b(a)r)”.
Hum, I can't recall if I ever seem those either but I sure saw with
single opening and closing parentheses, which was actually the reason
for my initial investigation.
But, as I said, I liked you solution about opening and closing
parentheses, but didn't mine work too?
>> Unfortunately I tried your solution with `re-builder' but it always
>> stop recognizing the URL at a closing parentheses. Could you verify if
>> it is working there for you?
>
> You are correct; here is a better (although not yet perfect) version:
>
> (setq mew-regex-url
> (let ((u "[^ \n>()\"`'“”]*"))
> (concat "\\b\\(\\(\\(file\\|news\\|mailto\\):\\)"
> "\\|\\(\\(s?https?\\|ftp\\|gopher\\|telnet\\|wais\\)://\\)\\)"
> "\\((" u ")\\|" u "[^ \n>()\"`'“”.,:]\\)+")))
I couldn't eval your concat statement with C-xC-e but I guess I got
the idea.
--
Diogo F. S. Ramos
More information about the Mew-int
mailing list