On 03/11/2014 01:18 PM, Nikolai Gorchilov wrote: > On Tue, Mar 11, 2014 at 6:10 PM, Alex Rousskov wrote: >> On 03/11/2014 08:05 AM, Omid Kosari wrote: >>> Is it possible for Squid to automatically find every similar object based on >>> something like md5 of objects and serve them to clients without need custom >>> DB ? >> No, because clients do not tell Squid what checksum they are looking >> for. >> It is possible to avoid caching duplicate content, but that allows you >> to handle cache hits more efficiently. It does not help with cache >> misses (when the URL requested by the client has not been seen before). > Actually, two commercial vendors - PeerApp and ThunderCache - claim > their products doesn't use urls to identify the objects, thus they > don't have to maintain StoreID-like de-duplication database manually. > > Any ideas how do they do it? Most likely they do not, and you are simply being mislead by their marketing claims. In general, it is not possible to ignore the request URL and still produce the right response (think about it!). They probably do not store duplicate cache objects, but, as discussed above, that is far from the "automatic StoreID" functionality that the original poster is asking about. In other words, there are at least two de-duplication layers: * The higher-level one is based on URLs and essentially requires manual URL mapping. It helps turn cache misses into hits. * The lower-level one is based on checksums and can be automated. It helps spend less cache space to serve cache hits. Some commercial products have implemented this lower-level optimization. Cheers, Alex. (*) where cache hit and miss are determined based on the original URL.