-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 27.04.16 20:01, Amos Jeffries пишет: > On 27/04/2016 11:32 p.m., Alfredo Rezinovsky wrote: >> I saw in debug log that when an ACL has many regexes each one is compared >> sequentially. >> >> If I have >> >> www.facebook.com >> facebook.com >> www.google.com >> google.com >> >> If will be faster to check just ONE optimized regex like >> (www\.)?(facebook|google).com than the previous three? >> >> I'm really talking about optimizing about 3000 url regexes in one huge >> regex because comparing each and every url to 3000 regexes is too slow. > > As Yuri was trying to point out (I think) simply using one bigger regex > pattern is not always meaning faster. Absolutely yes. For example: By my experience, expression (.*) for group selecting uses much more steps than (.*?) or (.+?). Yes, often last expressions has another meaning, but as part of optimization this method - as partial solution - is useful. Also, the site I point contains "explanation" section, which is good starting point in performance tuning of regexps. In two words: You can think that regex "steps" is equivalent of "CPU cycles". Just to simplify. And yes, this is direct dependency - more steps - more cycles - slower execution. > > > >> >> I know using >> (www\.facebook\.com)|(facebook\.com)|(www\.google\.com)|(google\.com) with >> PCRE will produce the same optimized result as >> (www\.)?(facebook|google)\.com. Squid uses GnuRegex. Does GNURegex lib >> optimizes this as well ? > > If you actually pass GNURegex that *single* pattern. Yes, it will do > some optimization. Though I'm not sure how much exactly in comparison to > PCRE. > > * Also, while GNURegex is the built-in backup regex engine bundled with > Squid. It really is only a backup engine for systems like Windows which > dont provide a regex engine. The stdlib regex library is always used if > available. On some OS that stdlib engine is GNU, on others PCRE or > something even better. > > > What you see in the log is the fact that Squid is actually *not* > configured with a single compound "optimized" pattern. You are actually > using a file with ~3000 patterns in it ... so 3000 regex patterns to be > checked against the URL. > > Whether Squid checks 3000 tests or some smaller number depends on what > Squid version you are using. The recent versions do some trivial pattern > aggregation and stripping away prefix/suffix ".*" garbage to help the > library optimize better. But as Yuri showed, bigger pattern is not > necessarily better *steps* for per-test speed. The gains are mostly in > reduced Squid code CPU time and RAM overheads. > Regex is still the slowest of the ACLs in terms of raw CPU consumed. > > > The biggest problem with using regex for domain name lists is that regex > is optimized for left-to-right comparisons. Domain name labels are built > right-to-left. dstdomain is optimized for right-to-left comparison with > an early-abort on mismatch and sub-domain wildcards - which gives it a > huge advantage in CPU cycles over regex. > > Amos > > _______________________________________________ > squid-users mailing list > squid-users@xxxxxxxxxxxxxxxxxxxxx > http://lists.squid-cache.org/listinfo/squid-users -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBCAAGBQJXIMsfAAoJENNXIZxhPexGuZ8H/2DNMNKp3u/3kmOsUczWH4KG mP09zPzbPu7veniLOR30RGFZEbAFr0UxPGnaASyzzRMbJZ2ChAqUEtwsJvT2+lCL g0lNZ5GPdnBh8DECrR0Cu5cV67Y8fXeQRdxYJlnjQdD4UH5thg6iZbOYNqOZLkOr FiCpK6m6J32QH9EgI5x8GwhZBxpEJLyilqeAaku3kxTY4yqeguiSh6L4srfYhc+U EPCR7q+dYrQ1UuroenHlCYnXLX/KmDD5AUA5AdxML1bNpTo1z7tVrdDVXbbBofIb CZ+Y9duuBtJ5zaYi2qVbROolx7GDDwT2zdhniA+UNaMhx6k2RMnKZHTcFScfsE8= =2fLk -----END PGP SIGNATURE-----
Attachment:
0x613DEC46.asc
Description: application/pgp-keys
_______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users