[mew-int 2933] Re: Identify urls

Christophe TROESTLER Christophe.Troestler at example.com
Sat Nov 6 01:01:34 JST 2010


On Fri, 5 Nov 2010 13:07:30 -0200, Diogo F.S.Ramos wrote:
> 
> > I am not sure allowing closing braces inside URLs is the way to go.
> > Sure some URLs contain braces but these are usually balanced.  I
> > personally use the following regex to allow "depth 1" matching braces.
> 
> You have a valid point, although I don't see the disadvantage of
> allowing closing braces inside URLs, but I guess it should correctly
> identify URLs like `(http://www.example.com/foo(bar))baz' as
> `http://www.example.com/foo(bar)'.

The problem with “(http://www.example.com/foo(bar))baz” is that it is
ambiguous.  Did the user want to say that “http://www.example.com/foo(bar)”
is the URL and forgot the space after the closing brace?  The only
case of URLs with braces I have seen are from Microsoft (MSDN) and
these have matching braces.  I have never seen URLs with two braces
like “http://www.example.com/foo(bar)(zzz)” or like
“http://www.example.com/foo(b(a)r)”.

> Unfortunately I tried your solution with `re-builder' but it always
> stop recognizing the URL at a closing parentheses. Could you verify if
> it is working there for you?

You are correct; here is a better (although not yet perfect) version:

(setq mew-regex-url
  (let ((u "[^  	\n>()\"`'“”]*"))
    (concat "\\b\\(\\(\\(file\\|news\\|mailto\\):\\)"
	    "\\|\\(\\(s?https?\\|ftp\\|gopher\\|telnet\\|wais\\)://\\)\\)"
	    "\\((" u ")\\|" u "[^  	\n>()\"`'“”.,:]\\)+")))

Best,
C.


More information about the Mew-int mailing list