Re: An Old Question: Cache Query/Extraction

Genaro Flores <genaro.flores@xxxxxxxxx> · Thu, 17 Sep 2009 17:34:29 +0100

One way to do it reasonably efficiently is to track store.log, where you
have [...]

Many thanks for the idea--store.log itself is pretty much all I need since 
mostly entries from a short interval before present concern me. Didn't know 
it held that much information. RTFM, they say :-)

There may be some small drift between this shadow database and the
actual content if you miss some log entries, but it's self-healing over
time as the cache content gets replaced.

For my purpose, even that wouldn't matter. A few lost entries in tens of 
thousands is negligible for the use case.

--On Tuesday, September 15, 2009 20:44 +0200 Henrik Nordstrom 
<henrik@xxxxxxxxxxxxxxxxxxx> wrote:

tis 2009-09-15 klockan 18:03 +0100 skrev Genaro Flores:

I guessed so but I was thinking a specialized tool could do the indexing
for whoever wants/needs it. Maybe I'll try making a couple short scripts
for that purpose and for searching the index and retrieving the targets.
I  was wishing somebody had done something similar before :-D

Quite likely some have done such tools, but I am not aware of any such
tool published on the Internet..

One way to do it reasonably efficiently is to track store.log, where you
have
   - Squid object id
   - URL
   - Mime type
   - time
   - HTTP status
   - last-modified
   - content-length
   - object size
   - expires

and some other small details.

just feed this into an database keyed by Squid object id, and indexed on
relevant pieces of the rest..

There may be some small drift between this shadow database and the
actual content if you miss some log entries, but it's self-healing over
time as the cache content gets replaced.

Regards
Henrik