Search squid archive

url_rewrite_program doesn't seem to work on squid 2.6 STABLE17

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I hope that someone on this group can give me some pointers.  I have a squid proxy setup running version 2.6 stable 17 of squid.  I recently upgraded from a very old version of squid, 2.4 something.  The proxy sits in front of a search appliance and all search requests goes through the proxy.  

One of my requirements is to have all search requests for cache:SOMEURL go to a URL rewrite program that compares the requested URL to a list of URLs that have been blacklisted.  These URLs are one per line in a text file.  Any line that starts with # or is blank is discarded by the url_rewrite_program.  This Perl program seemed to work fine in the old version but now it doesn't work at all.  

Here is the relevant portion of my Squid conf file:
-------------------------------------------------------------------------------
http_port 80 defaultsite=linsquid1o.myhost.com accel

url_rewrite_program /webroot/squid/imo/redir.pl
url_rewrite_children 10


cache_peer searchapp3o.myhost.com parent 80 0 no-query originserver name=searchapp proxy-only
cache_peer linsquid1o.myhost.com parent 9000 0 no-query originserver name=searchproxy proxy-only
acl bin urlpath_regex ^/cgi-bin/
cache_peer_access searchproxy allow bin
cache_peer_access searchapp deny bin

Here is the Perl program
-------------------------------------------------------------------------------
#!/usr/bin/perl


$| = 1;

my $CACHE_DENIED_URL = "http://www.mysite.com/mypage/pageDenied.intel";;
my $PATTERNS_FILE = "/webroot/squid/blocked.txt";
my $UPDATE_FREQ_SECONDS = 60;

my $last_update = 0;
my $last_modified = 0;
my $match_function;

my $url, $remote_host, $ident, $method, $urlgroup;
my $cache_url;

my @patterns;


while (<>) {
   chomp;
   ($url, $remote_host, $ident, $method, $urlgroup) = split;
  
   &update_patterns();

   $cache_url = &cache_url($url);
   if ($cache_url) {
      &update_patterns();
      if (&$match_function($cache_url)) {
         $cache_url = &url_encode($cache_url);
         print "302:$CACHE_DENIED_URL?URL=$cache_url\n";
         next;
      }
   }
   print "\n";
}

sub update_patterns {
   my $now = time();
   if ($now > $last_update + $UPDATE_FREQ_SECONDS) {
      my @a = stat($PATTERNS_FILE);
      my $mtime = $a[9];
      if ($mtime != $last_modified) {
         @patterns = &get_patterns();
         $match_function = build_match_function(@patterns);
         $last_modified = $mtime;
      }
   }
}


sub get_patterns {
   my @p = ();
   my $p = "";
   open PATTERNS, "< $PATTERNS_FILE" or die "Unable to open patterns file. $!";
   while (<PATTERNS>) {
      chomp;
      if (!/^\s*#/ && !/^\s*$/) {    # disregard comments and empty lines.
         $p = $_;
         $p =~ s#\/#\\/#g;
         $p =~ s/^\s+//g;
         $p =~ s/\s+$//g;
         if (&is_valid_pattern($p)) {
            push(@p, $p);
         }
      }
   }
   close PATTERNS;
   return @p;
}

sub is_valid_pattern {
   my $pat = shift;
   return eval { "" =~ m|$pat|; 1 } || 0;
}


sub build_match_function {
   my @p = @_;
   my $expr = join(' || ', map { "\$_[0] =~ m/$p[$_]/io" } (0..$#p));
   my $mf = eval "sub { $expr }";
   die "Failed to build match function: $@" if $@;
   return $mf;
}

sub cache_url {
   my $url = @_[0];
   (my $script, $qs) = split(/\?/, $url);
   if ($qs) {
      my $param, $name, $value;
      my @params = split(/&/, $qs);
      foreach $param (@params) {
         ($name, $value) = split(/=/, $param);
         $value =~ tr/+/ /;
         $value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack("C", hex($1))/eg;
         if ($value =~ /cache:([A-z0-9]{7,20}:)?([A-z]+:\/\/)?([^ ]+)/) {
            if ($2) {
               return $2 . $3;
            } else {
               # return "http://"; . $3;
               return $3;
            }
         }
      }
   }
   return "";
}

sub url_encode {
   my $str = @_[0];
   $str =~ tr/ /+/;
   $str =~ s/([\?&=:\/#])/sprintf("%%%02x", ord($1))/eg;
   return $str;
}

Below is a sample of the blocked URLs file
################################################################################
#
# URL Patterns to be Blocked
#---------------------------
# This file contains URL patterns which should be blocked
# in requests to the Google cache.
#
# The URL patterns should be entered one per line.
# Blank lines and lines that begin with a hash mark (#)
# are ignored.
#
# Anything that will work inside a Perl regular expression
# should work.
#
# Examples:
# http://www.bad.host/bad_directory/
# ^ftp:
# bad_file.html$
################################################################################
# Enter URLs below this line
################################################################################


www.badsite.com/


So my question, is there a better way of doing this?
Does someone see anything wrong that is keeping this from working in 2.6?

Thanks,
Martin C. Jacobson (Jake)

[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux