Adrian Chadd wrote:
Yup; I'm working on loading that into a hash for lookups, after normalising the URLs (removing the protocol, user@password, anchor; can't remove the query as some phish URLs bounce via well-known services like google, live.com, etc.)
I was actually having a thought about that.. Is a url hash the only way to go? It is easy for a phisher to wildcard a whole subdirectory or even a subdomain and make hashing of individual urls nearly useless. Perhaps there should be an optional list of domains or dst adresses for blocking the hosts obviously used only for phishing.
-- Andreas