On 12/9/06, Linus Torvalds <torvalds@xxxxxxxx> wrote:
Actually, just looking at the examples, it looks like memcached is fundamentally flawed, exactly the same way Apache mod_cache is fundamentally flawed.
I don't know if fundamentally flawed but (having used memcached) I don't think it's a big win for this at all. We can make gitweb to detect mod_perl and a few smarter things if it is running inside of it. In fact, we can (ab)use mod_perl and perl facilities a bit to do some serialization which will be a big win for some pages. What we need for that is to set a sensible the ETag and use some IPC to announce/check if other apache/modperl processes are preparing content for the same ETag. The first-process-to-announce a given ETag can then write it to a common temp directory (atomically - write to a temp-name and move to the expected name) while other processes wait, polling for the file. Once the file is in place the latecomers can just serve the content of the file and exit. (I am calling the "state we are serving" identifier ETag because I think we should also set it as the ETag in the HTTP headers, so well be able to check the ETag of future requests for staleness - all we need is a ref lookup, and if the SHA1 matches, we are sorted). So having this 'unique request identifier' doubles up nicely... The ETag should probably be: - SHA1+displaytype+args for pages that display an object identified by SHA1 - refname+SHA!+displaytype+args for pages that display something identified by a ref - SHA1(names and sha1s of all refs) for the summary page
You can't have a cache architecture where the client just does a "get", like memcached does. You need to have a "read-for-fill" operation, which says:
You _could_ make do with a convention of polling for "entryname" and "workingon-entryname" and if "workingon-entryname" is set to 1, you can expect entryname to be filled real soon now. However, memcached is completely memorybound, so it is only nice for really small stuff or for a large server farm which has gobs of spare ram. (Note that memcached does have timeouts which means that the 'workingon' value could have a short timeout in case the request is cancelled or the process dies - the nasty bit in the above plan would be the polling.)
I still don't understand why apache doesn't do it. I guess it wants to be stateless or something.
Apache doesn't do it because most web applications don't use the HTTP procol correctly - specially when it comes to the idempotency of GET. So in 99% of the cases, web apps serve truly different pages for the same GET request, depending on your cookie, IP address, time-of-day, etc. Most websites deal with very little traffic, so this isn't a problem. And many large sites that serve a lot of traffic from a dynamic web app want to be serving custom ads, let you login and see your personalised toolbar, etc,etc, so this wouldn't work for them either. So in practice, serialising speculatively on GET requests for the same URL has very little payoff except for static content. And that's quite fast anyway.... specially if the underlying OS is smokin' fast ;-) cheers, martin - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html