On 27/04/2016 11:32 p.m., Alfredo Rezinovsky wrote: > I saw in debug log that when an ACL has many regexes each one is compared > sequentially. > > If I have > > www.facebook.com > facebook.com > www.google.com > google.com > > If will be faster to check just ONE optimized regex like > (www\.)?(facebook|google).com than the previous three? > > I'm really talking about optimizing about 3000 url regexes in one huge > regex because comparing each and every url to 3000 regexes is too slow. As Yuri was trying to point out (I think) simply using one bigger regex pattern is not always meaning faster. > > I know using > (www\.facebook\.com)|(facebook\.com)|(www\.google\.com)|(google\.com) with > PCRE will produce the same optimized result as > (www\.)?(facebook|google)\.com. Squid uses GnuRegex. Does GNURegex lib > optimizes this as well ? If you actually pass GNURegex that *single* pattern. Yes, it will do some optimization. Though I'm not sure how much exactly in comparison to PCRE. * Also, while GNURegex is the built-in backup regex engine bundled with Squid. It really is only a backup engine for systems like Windows which dont provide a regex engine. The stdlib regex library is always used if available. On some OS that stdlib engine is GNU, on others PCRE or something even better. What you see in the log is the fact that Squid is actually *not* configured with a single compound "optimized" pattern. You are actually using a file with ~3000 patterns in it ... so 3000 regex patterns to be checked against the URL. Whether Squid checks 3000 tests or some smaller number depends on what Squid version you are using. The recent versions do some trivial pattern aggregation and stripping away prefix/suffix ".*" garbage to help the library optimize better. But as Yuri showed, bigger pattern is not necessarily better *steps* for per-test speed. The gains are mostly in reduced Squid code CPU time and RAM overheads. Regex is still the slowest of the ACLs in terms of raw CPU consumed. The biggest problem with using regex for domain name lists is that regex is optimized for left-to-right comparisons. Domain name labels are built right-to-left. dstdomain is optimized for right-to-left comparison with an early-abort on mismatch and sub-domain wildcards - which gives it a huge advantage in CPU cycles over regex. Amos _______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users