At Fri, 8 Aug 2008 15:52:07 -0400 (EDT), Leichter, Jerry wrote: > > | > > Funnily enough I was just working on this -- and found that we'd > | > > end up adding a couple megabytes to every browser. #DEFINE > | > > NONSTARTER. I am curious about the feasibility of a large bloom > | > > filter that fails back to online checking though. This has side > | > > effects but perhaps they can be made statistically very unlikely, > | > > without blowing out the size of a browser. > | > Why do you say a couple of megabytes? 99% of the value would be > | > 1024-bit RSA keys. There are ~32,000 such keys. If you devote an > | > 80-bit hash to each one (which is easily large enough to give you a > | > vanishingly small false positive probability; you could probably get > | > away with 64 bits), that's 320KB. Given that the smallest Firefox > | > [...] > You can get by with a lot less than 64 bits. People see problems like > this and immediately think "birthday paradox", but there is no "birthday > paradox" here: You aren't look for pairs in an ever-growing set, > you're looking for matches against a fixed set. If you use 30-bit > hashes - giving you about a 120KB table - the chance that any given > key happens to hash to something in the table is one in a billion, > now and forever. (Of course, if you use a given key repeatedly, and > it happens to be that 1 in a billion, it will hit every time. So an > additional table of "known good keys that happen to collide" is worth > maintaining. Even if you somehow built and maintained that table for > all the keys across all the systems in the world - how big would it > get, if only 1 in a billion keys world-wide got entered?) I don't believe your math is correct here. Or rather, it would be correct if there was only one bad key. Remember, there are N bad keys and you're using a b-bit hash, which has 2^b distinct values. If you put N' entries in the hash table, the probability that a new key will have the same digest as one of them is N'/(2^b). If b is sufficiently large to make collisions rare, then N'=~N and we get N/(2^b). To be concrete, we have 2^15 distinct keys, so, the probability of a false positive becomes (2^15)/(2^b)=2^(b-15). To get that probability below 1 billion, b+15 >= 30, so you need about 45 bits. I chose 64 because it seemed to me that a false positive probability of 2^{-48} or so was better. -Ekr