On Sat, Jun 23, 2007, Andreas Pettersson wrote: > I would :) > However Phishtank publishes a full xml file which with some tweaking > could be converted into a plain text list of domains or urls for direct > use with squid. > http://www.phishtank.com/blog/2006/10/17/xml-data-file-of-online-valid-phishes-from-phishtank/ Yup; I'm working on loading that into a hash for lookups, after normalising the URLs (removing the protocol, user@password, anchor; can't remove the query as some phish URLs bounce via well-known services like google, live.com, etc.) > I'm not sure realtime lookups via the google or phishtank api could keep > up with caches serving over 100 requests/sec. The lookups have to be done local by pushing server updates out, a la the Google safebrowsing hash updates. If I get this stuff done I'm hoping the phishtank guys will release diffs to their XML database file. My code won't be using the live APIs, they'll download the XML database (phishtank) and hash database (google) locally and load them into an external_acl helper. > By the way, haven't a DNSBL for this purpose been discussed previously? DNSBLs require one of two things: * a crapload of infrastructure to service the DNSBL, as you're talking about caches who could request thousands of requests a second; or * local DNS zones which are then loaded into private DNS servers (like ordb used to do) so you can lookup against that. Adding an extra few hundred millisecond per request is probably not going to help the browsing experience. Adrian