Re: mod_substitute only replaces first pattern match

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

2017-02-06 17:25 GMT+01:00 <Uwe.Poliak@xxxxxxxxx>:
Hi,

I am trying a reverse proxy server based on apache httpd v2.4 on the most recent release of CentOS:

# httpd -version
Server version: Apache/2.4.6 (CentOS)
Server built:   Nov 14 2016 18:04:44

# uname -a
Linux hostname.domain.tld 3.10.0-514.6.1.el7.x86_64 #1 SMP Wed Jan 18 13:06:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

# cat /etc/centos-release
CentOS Linux release 7.3.1611 (Core)

Within this configuration I have to use mod_substitute to rewrite URLs from some applications.
For this I am using mod_filter with the SUBSTITUTE Filter as follows:

  ProxyRequests Off
  ProxyPass /my-location https://my-server.domain.tld/

  <Location /my-location/>
    ProxyPassReverse    /my-location

    FilterDeclare       AGFILTER

    FilterProvider      AGFILTER SUBSTITUTE "%{resp:Content-Type} =~ m#^text/html#"
    FilterProvider      AGFILTER SUBSTITUTE "%{resp:Content-Type} =~ m#.*/css#"
    FilterProvider      AGFILTER SUBSTITUTE "%{resp:Content-Type} =~ m#.*/json#"
    FilterProvider      AGFILTER SUBSTITUTE "%{resp:Content-Type} =~ m#.*/_javascript_#"

    FilterChain         AGFILTER

    Substitute          "s#/(css|js|images|management|system|help)/(.*)#/my-location/$1/$2#fi"
  </Location>

It works fine if there is only one occurrence of the search pattern in a line in the html code. This occurrence will be replaced properly.
However, if there are two or more occurrences of the search pattern in one html line, only the first one is replaced. It looks like this example:

<tr><th colspan=3 nowrap></th><th colspan=3 nowrap><a href="">img border=0 src="">gif" alt=" Spalte ausblenden"></a> <a href=""><img src="" border=0 alt=" Spalte nach rechts schieben"></a></th><th colspan=3 nowrap><a href="">><img src="" alt=" Spalte nach links schieben" border=0 ></a> <a ....

Here you see: The first one is replaced, the second image URL is the same as before.

Is this works-as-designed?

I think that the issue is in the (.*) of your regex. In your example it will match the first occurrence of the pattern (like "images/") and will end up eating all the rest of the chars (greedy behavior as far as I can see). The following matches more than on occurrences in your example string, because it checks for the .something extension:

(css|js|images|management|system|help)\/(\w+\.\w)

So mod_substitute seems to be working fine, the regex would needs a bit of tuning imo. The documentation might need to mention the greedy behavior, but I need to triple check that what I just said makes sense :)

Hope that helps!

Luca

 

[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux