Re: kernel.org mirroring (Re: [GIT PULL] MMC update)

Linus Torvalds <torvalds@xxxxxxxx> · Fri, 8 Dec 2006 15:27:42 -0800 (PST)

On Fri, 8 Dec 2006, Jeff Garzik wrote:
> 
> This is quite nice and easy, if memory-only caching works for the situation:
> http://www.danga.com/memcached/
> 
> There are APIs for C, Perl, and plenty of other languages.

Actually, just looking at the examples, it looks like memcached is 
fundamentally flawed, exactly the same way Apache mod_cache is 
fundamentally flawed.

Exactly like mod_perl, it appears that if something isn't cached, the 
memcached server will just return "not cached" to everybody, and all the 
clients will, like a stampeding herd, all do the uncached access. Even if 
they have the exact same query. And you're back to square one: your server 
load went through the roof.

You can't have a cache architecture where the client just does a "get", 
like memcached does. You need to have a "read-for-fill" operation, which 
says:

 - get this cache entry

 - if this cache entry does not exist, get an exclusive lock

 - if you get that exclusive lock, return NULL, and the client promises 
   that it will fill it (inside the kernel, see for example 
   "find_get_page()" vs "grab_cache_page()" - the latter will return a 
   locked page whether it exists or not, and if it didn't exist, it will 
   have inserted it into the cache datastructures so that you don't have 
   multiple concurrent readers trying to all create different pages)

 - if you block on the exclusive lock, that means that some other client 
   is busy fulfilling it. When you unblock, do a regular "read" operation 
   (not a "repeat": we only block once, and if that fails, that's it).

 - any cachefill operation will release the lock (and allow pending 
   cache queries to succeed)

 - the locking client going away will release the lock (and allow pending 
   cache queries to fail, and hopefully cause a "set cache" operation)

 - a timeout (settable by some method) will also force-release a lock in 
   the case of buggy clients that do "read-for-modify" but never do the 
   "modify".

The "timeout" thing is to handle the case of buggy clients that crash 
after trying to get - it will slow down things _enormously_ if that 
happens, but hey, it's a buggy client. And it will still continue to work.

Looking at the memcached operations, they have the "read" op (aka "get"), 
but they seem to have no "read-for-fill" op. So memcached fundamentally 
doesn't fix this problem, at least without explicit serialization by the 
client.

(The serialization could be done by the client, but that would serialize 
_everything_, and mean that a uncached lookup will hold up all the cached 
ones too - which is why you do NOT want to serialize in the caller: you 
really want to serialize in the layer that does the caching).

It's fairly easy to do the lock. You could just hash the lookup key using 
some reasonable hash. It doesn't even have to be a _big_ hash: it's ok to 
have just a few bits for lock hashing, since it's only going to be for 
misses.

So hashing to eight bits and using 256 locks is probably fine, as long as 
this is done by the cache server. That means that the cache server only 
ever needs to track that many timeouts, for example (it also indirectly 
sets a limit on the number of possible "outstanding uncached requests", 
which is _exactly_ what you want - but hash collissions will also 
potentially unlock the _wrong_ bucket, so if you have too many of them, it 
can make the "only one outstanding unhashed request per key" not be as 
effective).

So assuming you get good cache hit statistics, the locking shouldn't be a 
big issue. But you definitely want to do it, because the whole point of 
caching was to not do the same op multiple times. 

I still don't understand why apache doesn't do it. I guess it wants to be 
stateless or something.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html