On Wed, Aug 21, 2013 at 05:27:55PM +0100, Andrew Wood wrote: > Hi > > Can someone please help me work out an algorithm to remove overlapping > subdomains from a blackclist such as shallalist to prevent errors such as: > > ERROR: 'interracialcandy.tumblr.com' is a subdomain of '.tumblr.com' > 2013/08/21 17:18:41| ERROR: because of this '.tumblr.com' is ignored to > keep splay tree searching predictable > 2013/08/21 17:18:41| ERROR: You should remove > 'interracialcandy.tumblr.com' from the ACL named 'ProhibitedSitesDomains' Is it your intention to block tumblr.com and all subdomains ? And are .tumblr.com (non-adult) and interracialcandy.tumblr.com (adult) both in the same list ? You can use ufdbguard, a URL filter for Squid. The ufdbguard software suite has a utility called ufdbGenTable that converts text files with domains and URLs to a database table and in this conversion process emits similar errors but behaves different from Squid: if both subdomain.example.com and example.com are in a list, ufdbGenTable puts example.com in the table which effectively blocks example.com and all subdomains. Marcus > The problem is that TLDs like .com or .net are easy but some domains > have two 'tlds' such as .co.uk (yes I know strictly thats not a tld but > you know what i mean!) and there are so many different country domains > some with two levels and the possibility of more in the future how can I > make it future proof? > > Im sure Im not the only one to tear my hair out on this but I cant find > a solution anywhere. > Perhaps we can calaborate on here to produce a Perl or Python script > which anyone can use? > > Thanks > Andrew > >