-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Furthermore. The more specifically a regular expression, so it usually faster. 27.04.16 20:01, Amos Jeffries пишет: > On 27/04/2016 11:32 p.m., Alfredo Rezinovsky wrote: >> I saw in debug log that when an ACL has many regexes each one is compared >> sequentially. >> >> If I have >> >> www.facebook.com >> facebook.com >> www.google.com >> google.com >> >> If will be faster to check just ONE optimized regex like >> (www\.)?(facebook|google).com than the previous three? >> >> I'm really talking about optimizing about 3000 url regexes in one huge >> regex because comparing each and every url to 3000 regexes is too slow. > > As Yuri was trying to point out (I think) simply using one bigger regex > pattern is not always meaning faster. > > >> >> I know using >> (www\.facebook\.com)|(facebook\.com)|(www\.google\.com)|(google\.com) with >> PCRE will produce the same optimized result as >> (www\.)?(facebook|google)\.com. Squid uses GnuRegex. Does GNURegex lib >> optimizes this as well ? > > If you actually pass GNURegex that *single* pattern. Yes, it will do > some optimization. Though I'm not sure how much exactly in comparison to > PCRE. > > * Also, while GNURegex is the built-in backup regex engine bundled with > Squid. It really is only a backup engine for systems like Windows which > dont provide a regex engine. The stdlib regex library is always used if > available. On some OS that stdlib engine is GNU, on others PCRE or > something even better. > > > What you see in the log is the fact that Squid is actually *not* > configured with a single compound "optimized" pattern. You are actually > using a file with ~3000 patterns in it ... so 3000 regex patterns to be > checked against the URL. > > Whether Squid checks 3000 tests or some smaller number depends on what > Squid version you are using. The recent versions do some trivial pattern > aggregation and stripping away prefix/suffix ".*" garbage to help the > library optimize better. But as Yuri showed, bigger pattern is not > necessarily better *steps* for per-test speed. The gains are mostly in > reduced Squid code CPU time and RAM overheads. > Regex is still the slowest of the ACLs in terms of raw CPU consumed. > > > The biggest problem with using regex for domain name lists is that regex > is optimized for left-to-right comparisons. Domain name labels are built > right-to-left. dstdomain is optimized for right-to-left comparison with > an early-abort on mismatch and sub-domain wildcards - which gives it a > huge advantage in CPU cycles over regex. > > Amos > > _______________________________________________ > squid-users mailing list > squid-users@xxxxxxxxxxxxxxxxxxxxx > http://lists.squid-cache.org/listinfo/squid-users -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBCAAGBQJXIMvAAAoJENNXIZxhPexGY6UIAJdAACrNLs2MdfAdnUHoqtrD /5oiUdd1kBMnAyOvpJfXZOK8glSui3wnTJpsw6sq7YOgU4PzIH7rCRw9uAsTyIxQ 3lyWh5u2GJDObz9DSUZVUDs7WtEHyclNxSO4OkoA7zNliFe4uvdZ4mujrWk2mHxB AjHEWmOEFzVlq0AbCnrbDJ6HX1KMURbCpkP/3G8zPauJEyCMiYVAIVigaT1H4yko JV0AgSII0zns+hKPUWywZ1vlCeOaIvEqGZu1/Z1q/L1oWNZ4HqgFg1jYIBYlA3oY 34727VzE0LSLQX673nIkAn4uF/lkqmAgzAbOQ9Q+7N5bj+q0a6ELUEFMxq1m8FA= =p9LL -----END PGP SIGNATURE-----
Attachment:
0x613DEC46.asc
Description: application/pgp-keys
_______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users