Re: is there a squid "cache rank" value available for statistics?

"Amos Jeffries" <squid3@xxxxxxxxxxxxx> · Mon, 20 Apr 2009 11:28:54 +1200 (NZST)

> Hi,
>
> I'm wondering about ways to measure the optimum size for a cache, in terms
> of the "value" you gain from each GB of cache space.  If you've got a
> 400GB
> cache and only 99% of your hits come from the first 350GB, there's
> probably
> no point looking for a larger cache.  If only 80% come from the first
> 350GB, then a bigger cache might well be useful.
>

Squid suffers from a little bit of an anochronism in the way it stores
object. the classic ufs systems essentially use round-robin and hash to
determine storage location for each object separately. This works wonders
on ensuring no clashes, but not so good for retrieval optimization.
Adrian Chadd has done a lot of study and some work in this area
particularly for Squid-2.6/2.7. His paper for FreeBSD conference is a good
read on how disk storage relates to Squid.
http://www.squid-cache.org/~adrian/talks/20081007%20-%20NYCBSDCON%20-%20Disk%20IO.pdf

> I realise there are rules of thumb for cache size, it would be interesting
> to be able to analyse a particular squid installation.
>

Feel free. We would be interested in any improvements you can come up with.

> Squid obviously removes objects from its cache based on the chosen
> cache_replacement_policy.  It appears from the comments in squid.conf that
> in the case of the LRU policy, this is implemented as a list, presumably a
> queue of pointers to objects in the cache.  Objects which come to the head
> of the queue are presumably next for removal.  I guess if an object in the
> cache gets used it goes back to the tail of the queue.   I suppose this
> process must involve linearly traversing the queue to find the object and
> remove it, which is presumably why heap-based policies are available.

IIRC there is a doubly-linked list with tail pointer for LRU.

>
> I wonder if it would be feasible to calculate a "cache rank", which
> indicates the position an object was within the queue at the time of the
> hit.  So, perhaps 0 means at the tail of the queue, 1 means at the head.
> If this could be reported alongside each hit in the access.log, one could
> draw stats on the amount of hits served by each portion of the queue and
> therefore determine the value of expanding or contracting your cache.
>
> In the case of simple LRU, if the queue must be traversed to find each
> element and requeue it (perhaps this isn't the case?), I suppose one could
> count the position in the queue and divide by the total length.

Yes, same big problems with that in LRU as displaying all objects in the
cache ( >1 million is not uncommon cache sizes) and regex purges.

>
> With a heap, things are more complex.  I guess you could give an
> indication
> of the depth in the heap but there would be so many objects on the lowest
> levels, I don't suppose this would be a great guide.  Is there some better
> value available, such as the key used in the heap maybe?

There is fileno or hashed value rather than URL. You still have the same
issues of traversal though.

>
> Or perhaps the whole idea is flawed somehow?
>
> Comments, criticisms, explanations, rebukes all welcome.
> Gavin
>

If you want to investigate. I'll gently nudge you towards Squid-3 where
the rest of the development is going on and improvements have the best
chance of survival.

For further discussion you may want to bring this up in squid-dev where
the developers hang out.

Amos