Search squid archive

Re: ACL matches when it shouldn't

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Of course this script is sluggish since it reads many category files and forks at least 3-6 times.

If you *really* want to implement this with a perl script, it should read all files at startup and the script does a lookup using perl data structures.

But I suggest to look at ufdbGuard which is a URL filter that is way faster and has all functionality that you need.

Marcus


On 2020-10-02 10:08, Vieri wrote:
Regarding the use of an external ACL I quickly implemented a perl script that "does the job", but it seems to be somewhat sluggish.

This is how it's configured in squid.conf:
external_acl_type bllookup ttl=86400 negative_ttl=86400 children-max=80 children-startup=10 children-idle=3 concurrency=8 %PROTO %DST %PORT %PATH /opt/custom/scripts/squid/ext_txt_blwl_acl.pl --categories=adv,aggressive,alcohol,anonvpn,automobile_bikes,automobile_boats,automobile_cars,automobile_planes,chat,costtraps,dating,drugs,dynamic,finance_insurance,finance_moneylending,finance_other,finance_realestate,finance_trading,fortunetelling,forum,gamble,hacking,hobby_cooking,hobby_games-misc,hobby_games-online,hobby_gardening,hobby_pets,homestyle,ibs,imagehosting,isp,jobsearch,military,models,movies,music,podcasts,politics,porn,radiotv,recreation_humor,recreation_martialarts,recreation_restaurants,recreation_sports,recreation_travel,recreation_wellness,redirector,religion,remotecontrol,ringtones,science_astronomy,science_chemistry,sex_education,sex_lingerie,shopping,socialnet,spyware,tracker,updatesites,urlshortener,violence,warez,weapons,webphone,webradio,webtv

I'd like to avoid the use of a DB if possible, but maybe someone here has an idea to share on flat file text searches.

Currently the dir structure of my blacklists is:

topdir
category1 ... categoryN
domains urls

So basically one example file to search in is topdir/category8/urls, etc.

The helper perl script contains this code to decide whether to block access or not:

foreach( @categories )
{
         chomp($s_urls = qx{grep -nwx '$uri_dst$uri_path' $cats_where/$_/urls | head -n 1 | cut -f1 -d:});

         if (length($s_urls) > 0) {
             if ($whitelist == 0) {
                 $status = $cid." ERR message=\"URL ".$uri_dst." in BL ".$_." (line ".$s_urls.")\"";
             } else {
                 $status = $cid." ERR message=\"URL ".$uri_dst." not in WL ".$_." (line ".$s_urls.")\"";
             }
             next;
         }

         chomp($s_urls = qx{grep -nwx '$uri_dst' $cats_where/$_/domains | head -n 1 | cut -f1 -d:});

         if (length($s_urls) > 0) {
             if ($whitelist == 0) {
                 $status = $cid." ERR message=\"Domain ".$uri_dst." in BL ".$_." (line ".$s_urls.")\"";
             } else {
                 $status = $cid." ERR message=\"Domain ".$uri_dst." not in WL ".$_." (line ".$s_urls.")\"";
             }
             next;
         }
}

There are currently 66 "categories" with around 50MB of text data in all.
So that's a lot to go through each time there's an HTTP request.
Apart from placing these blacklists on a ramdisk (currently on an M.2 SSD disk so I'm not sure I'll notice anything) what else can I try?
Should I reindex the lists and group them all alphabetically?
For instance should I process the lists in order to generate a dir structure as follows?

topdir
a b c d e f ... x y z 0 1 2 3 ... 7 8 9
domains urls

An example for a client requesting https://www.google.com/ would lead to searching only 2 files:
topdir/w/domains
topdir/w/urls

An example for a client requesting https://01.whatever.com/x would also lead to searching only 2 files:
topdir/0/domains
topdir/0/urls

An example for a client requesting https://8.8.8.8/xyz would also lead to searching only 2 files:
topdir/8/domains
topdir/8/urls

Any ideas or links to scripts that already prepare lists for this?

Thanks,

Vieri
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users




[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux