Re: kernel.org mirroring (Re: [GIT PULL] MMC update)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff Garzik wrote:
> Jakub Narebski wrote:

>> In addition to setting either Expires: header or Cache-Control: max-age
>> gitweb should also set Last-Modified: and ETag headers, and also 
>> probably respond to If-Modified-Since: and If-None-Match: requests.
>> 
>> Would be worth implementing this?
> 
> IMO yes, since most major browsers, caches, and spiders support these 
> headers.
 
Sending Last-Modified: should be easy; sending ETag needs some consensus
on the contents: mainly about validation. Responding to If-Modified-Since:
and If-None-Match: should cut at least _some_ of the page generating time.
If ETag can be calculated on URL alone, then we can cut If-None-Match:
just at beginning of script.
 
>> For some pages ETag is natural; for other Last-Modified: would be more
>> natural.
> 
> Yes, a good point to note.
> 
>> Usualy you can compare ETags base on URL alone.
> 
> Mostly true:  you must also consider HTTP_ACCEPT

Well, yes, ETag is HTTP/1.1 header. 

>> Wouldn't it be simplier to just set Last-Modified: header (and check
>> it?)
> 
> That would be a good start, and suffice for many cases.  If the CGI can 
> simply stat(2) files rather than executing git-* programs, that would 
> increase efficiency quite a bit.

As I said, I'm not talking (at least now) about saving generated HTML
output. This I think is better solved in caching engine like Squid can
be. Although even here some git specific can be of help: we can invalidate
cache on push, and we know that some results doesn't ever change (well,
with exception of changing output of gitweb).

> A core problem with cache hints via HTTP headers (last-modified, etc.) 
> is that you don't achieve caching across multiple clients, just across 
> repeated queries from the same client (or caching proxy).
> 
> At least for the RSS/Atom feeds and the git main page, it makes no sense 
> to regenerate that data repeatedly.
> 
> Internally, gitweb would need to do a stat() on key files, and return 
> pre-generated XML for the feeds if the stat() reveals no changes.  Ditto 
> for the front page.

I'm not sure if it is worth implementing in gitweb, or is it better left
to caching engine. With the projects list page and summary page there is
additional problem with relative dates, although this can be solved using
Jonas Fonseca idea of using absolute dates in the page and using ECMAScript
(JavaScript) to convert them to relative: on load, and perhaps on timer ;-)


What can be _easily_ done:
 * Use post 1.4.4 gitweb, which uses git-for-each-ref to generate summary
   page; this leads to around 3 times faster summary page.
 * Perhaps using projects list file (which can be now generated by gitweb)
   instead of scanning directories and stat()-ing for owner would help
   with time to generate projects lis page

What can be quite easy incorporated into gitweb:
 * For immutable pages set Expires: or Cache-Control: max-age (or both)
   to infinity
 * Calculate hash+action based ETag at least for those actions where it is
   easy, and respond with 304 Not Modified as soon as it can.
   This might require some code reorganization to not begin writing output
   before calculating ETag and ETag comparison (If-Match, If-None-Match).
 * Generate Last-Modified: for those views where it can be calculated,
   and respond with 304 Not Modified as soon as it can.

What can be easily done using caching engine:
 * Select top 10 of common queries, and cache them, invalidating cache on push
   (depending on query: for example invalidate project list on push to any
   project, invalidate RSS/Atom feed and summary pages only on push to specific
   project) - can be done with git hooks.
-- 
Jakub Narebski
Poland
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]