Re: Redirecting paths with extra slashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 8 Dec 2007 15:40:09 +0100
Torsten Foertsch <torsten.foertsch@xxxxxxx> wrote:

> On Sat 08 Dec 2007, Christian Lerrahn wrote:
> > > RewriteEngine On
> > > RewriteRule (.*)//+(.*) $1$2 [R=permanent,L]
> >
> > Thanks for that. I'm sorry to still bother. I'd like to get rid of
> > paths like //foo/bar, too, which do not match with this rule. To be
> > honest I don't quite understand the rule. That's probably the reason
> > why I can't modify if correctly to match to //foo/bar as well. When
> > I saw the regexp, I thought that I would end up without any slashes
> > but obviously I'm not. Wouldn't matching /foo//bar/ match as
> > $1=/foo and $2=bar/ ? Why does it not match like that? Then also it
> > seems to me that (.*) should also match an empty string which would
> > mean that leading slashes would get stripped, too. Why does that
> > not happen?
> 
> You need to know that * in regexes is greedy. That means it eats up
> as many characters as it could to match the regexp. So in /foo///bar
> $1 gets /foo/ and not only /foo.
> 
> What you need for $1 is a nongreedy one (*? instead of *), something
> like this:
> 
> RewriteRule (.*?)//+(.*) $1/$2 ...
> 
> You can try this in a little Perl-onliner:
> 
> perl -ne 'BEGIN {$|=1; print "> "} if(m!(.*?)//+(.*)!) {print
> "$1\t$2\n"} else {print "no match\n"} print "> "'
> 
> It offers you a "> " prompt to enter a string that is matched against
> that regexp. Then $1 and $2 are printed delimited by a tab-character.
> 
> You'll see that the new regexp matches even at the beginning of the
> line:
> 
> > /foo/bar
> no match
> > /foo//bar
> /foo    bar
> > /foo///bar
> /foo    bar
> > ///foo///bar
>         foo///bar
> > //foo//bar
>         foo//bar

I realised that the matching was greedy and assumed that the question
mark would serve the same purpose as in perl. However, ///foo/bar
should still match even if the pattern is greedy. After all, there is
no match to // between foo and bar. However, it does not match on // at
the beginning.
I actually was wrong in my last post. The rule

RewriteRule (.*/)/+(.*) $1$2 [R=permanent,L]

fixes almost all of my problems. The only problem that remains is that
the pattern doesn't match at the beginning of the path. The weird thing
is that a path like

//foo//bar

will get converted to /foo/bar in 2 redirection which are a match on
the first // first (//foo//bar -> /foo//bar) and then a match on the
later occurrence of // (i.e. /foo//bar -> /foo/bar). No, this does not
make any sense to me. :(

> The last 2 of the examples above reveal another problem with the
> approach. The RewriteRule matches only the first occurrence and then
> sends a redirect to the browser. If your URL contains multiple
> occurrences of subsequent slashes you may hit the browser's redirect
> limit.
> 
> To overcome that you can try to loop in mod_rewrite (untested):
> 
> RewriteRule (.*?)//+(.*) $1/$2 [E=R:$1/$2,N]
> 
> RewriteCond %{ENV:R} .
> RewriteRule . %{ENV:R} [R=permanent,L]

This doesn't matter too much to me. URLs that have more than one place
with too many slashes are rather rare. Therefore I'm ok with that
resulting in more than one redirect.

Cheers,
Christian

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx


[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux