Re: kernel.org mirroring (Re: [GIT PULL] MMC update)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Martin Langhoff wrote:

> We can make gitweb to detect mod_perl and a few smarter things if it
> is running inside of it. In fact, we can (ab)use mod_perl and perl
> facilities a bit to do some serialization which will be a big win for
> some pages. What we need for that is to set a sensible the ETag and
> use some IPC to announce/check if other apache/modperl processes are
> preparing content for the same ETag. The first-process-to-announce a
> given ETag can then write it to a common temp directory (atomically -
> write to a temp-name and move to the expected name) while other
> processes wait, polling for the file. Once the file is in place the
> latecomers can just serve the content of the file and exit.

First, it would (and could) work only for serving gitweb over mod_perl.
I'm not sure if overhead with IPC and complications implementing are
worth it: this perhaps be better solved by caching engine.

But let us put aside for a while actual caching (writing HTML version
of the page to a common temp directory, and serving this static page
if possible), and talk a bit what gitweb can do with respect to
cache validation.

In addition to setting either Expires: header or Cache-Control: max-age
gitweb should also set Last-Modified: and ETag headers, and also 
probably respond to If-Modified-Since: and If-None-Match: requests.

Would be worth implementing this?
 
> (I am calling the "state we are serving" identifier ETag because I
> think we should also set it as the ETag in the HTTP headers, so well
> be able to check the ETag of future requests for staleness - all we
> need is a ref lookup, and if the SHA1 matches, we are sorted). So
> having this 'unique request identifier' doubles up nicely...

For some pages ETag is natural; for other Last-Modified: would be more
natural.

> The ETag should probably be:
>  - SHA1+displaytype+args for pages that display an object identified
>    by SHA1

What uniquely identifies contents in "object" views ("commit", "tag",
"tree", "blob") is either h=SHA1, or hb=SHA1;f=FILENAME (with absence
of h=SHA1). If both h=SHA1 and hb=SHA1 is present, hb=SHA1 serves as
backlink. The "diff" views ("commitdiff", "blobdiff") are uniquely
identified by pair of object identifiers (pairs of SHA1, or pairs of
hb SHA1 + FILENAME).

Three of those views ("blob", "commitdiff", "blobdiff") have their 
"plain" version; so ETag should include displaytype (action, 'a' 
parameter).

The hb=SHA1;f=FILENAME indentifier can be converted at cost of one
call to git command (but which is a bit expensive as it recurses
trees), namely to git-ls-tree.

ETag can be simply args (query), if all h/hb/hbp parameters are SHA1.
Or ETag can be SHA1 of an object (or pair of SHA1 in the case of diff),
but this is little more costly to verify. Although we usually (always?) 
convert hb=SHA1;f=FILENAME to h=SHA1 anyway when displaying/generating 
page.

Usualy you can compare ETags base on URL alone.
   
>  - refname+SHA!+displaytype+args for pages that display something
>    identified by a ref

For objects views we can simply convert refname to SHA1. I'm not sure if 
it is worth it. In the cases when for view we have to calculate SHA1 of 
object anyway, we can return (and validate) ETag with SHA1 as above.

- ETag and/or Last-Modified headers for "log" views: "log", 
"shortlog" (is part of summary view), "history", "rss"/"atom" views.

On one hand all log views (at least now) are identified by their 
parameters (action/view name, and filename in the case of history view) 
and SHA1 of top commit. On the other hand it might be easier to use 
Last-Modified with date of top commit... Verifying SHA1 based ETag 
could add some overhead in the case of miss.

>  - SHA1(names and sha1s of all refs) for the summary page

Wouldn't it be simplier to just set Last-Modified: header (and check
it?)


P.S. Can anyone post some benchmark comparing gitweb deployed under 
mod_perl as compared to deployed as CGI script? Does kernel.org use 
mod_perl, or CGI version of gitweb?

-- 
Jakub Narebski
Poland
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]