Re: is there a squid "cache rank" value available for statistics?

Gavin McCullagh <gavin.mccullagh@xxxxxx> · Mon, 20 Apr 2009 10:50:13 +0100

Hi,

On Mon, 20 Apr 2009, Amos Jeffries wrote:

> Squid suffers from a little bit of an anochronism in the way it stores
> object. the classic ufs systems essentially use round-robin and hash to
> determine storage location for each object separately. This works wonders
> on ensuring no clashes, but not so good for retrieval optimization.
> Adrian Chadd has done a lot of study and some work in this area
> particularly for Squid-2.6/2.7. His paper for FreeBSD conference is a good
> read on how disk storage relates to Squid.
> http://www.squid-cache.org/~adrian/talks/20081007%20-%20NYCBSDCON%20-%20Disk%20IO.pdf

Thanks, I'll take a look.

> > I realise there are rules of thumb for cache size, it would be interesting
> > to be able to analyse a particular squid installation.
> 
> Feel free. We would be interested in any improvements you can come up with.

I didn't say anything about being able to improve anything! ;-)

I was hoping if a given user could plot a report of %age of hits coming
from each successive GB of the cache (or a rough guide to that), they could
more easily work out, based on their own workload, what the best value
cache size would be.  Obviously I don't mean physical disk location, I mean
"if my cache were eg halved in size, how many hits would I lose?".

I was naively hoping that the code might easily be leveraged to capture
this "cache rank" and log it for each hit, but the more I think about it,
the less likely this seems.

> IIRC there is a doubly-linked list with tail pointer for LRU.

That was my guess alright, so requeueing is O(1).  I presume traversal is
not necessary for a HIT though, which means the position in the removal
queue may not be easy to determine.

Looking over the HP paper, heap LRU seems to have been created so that the
comparison between LRU, GSDF and LFUDA could be more easily made.  I'm
guessing LRU is as quick (and simpler) as a list than as a heap.

> > In the case of simple LRU, if the queue must be traversed to find each
> > element and requeue it (perhaps this isn't the case?), I suppose one could
> > count the position in the queue and divide by the total length.
> 
> Yes, same big problems with that in LRU as displaying all objects in the
> cache ( >1 million is not uncommon cache sizes) and regex purges.

Definitely not something you want to do often :-)

> > With a heap, things are more complex.  I guess you could give an
> > indication
> > of the depth in the heap but there would be so many objects on the lowest
> > levels, I don't suppose this would be a great guide.  Is there some better
> > value available, such as the key used in the heap maybe?
> 
> There is fileno or hashed value rather than URL. You still have the same
> issues of traversal though.

Hmmm.  I guess I need to read up a little more on how heap LFUDA is
implemented.

> If you want to investigate. I'll gently nudge you towards Squid-3 where
> the rest of the development is going on and improvements have the best
> chance of survival.
>
> For further discussion you may want to bring this up in squid-dev where
> the developers hang out.

Yeah, fair points.  Thanks.

Gavin