Re: Automatic StoreID ?

babajaga <augustus_meyer@xxxxxxxx> · Fri, 14 Mar 2014 05:34:48 -0700 (PDT)

>Actually, two commercial vendors - PeerApp and ThunderCache - claim
their products doesn't use urls to identify the objects, thus they
don't have to maintain StoreID-like de-duplication database manually.

Any ideas how do they do it? <

Instead of first mapping the URL to a memory-resident table, keeping
pointers (file-id, bucket no.) to the real location of the object on disk, a
hash-value, derived from the URL could directly be used to designate the
storage location on disk, avoiding the translation table, squid uses.
This is the principle of every hashed table in a fast database system.
Drawback is, you have to deal with "collisions" on the disk and "overflows":
hashes for different URLs point to same storage location on disk. Different
solutions for this problem available, though (chaining, sequential storage,
secondary storage area etc.). And you have to manage variable sized
"buckets", the storage locations, hashing points to.

Positive consequence: No rebuild of the in-memory-table necessary, as there
is none. Avoids the time-comsuning rebuild of rock-storage-table from disk.

I can imagine, that because of historical reasons (much simpler to
implement), squid uses the translation-table instead of direct hashing,
whereas Thundercache etc. can rely on some low-level DB-system, having
direct hashing "ready to be used".

--
View this message in context: http://squid-web-proxy-cache.1019090.n4.nabble.com/Automatic-StoreID-tp4665140p4665198.html
Sent from the Squid - Users mailing list archive at Nabble.com.