Re: [RFD] My thoughts about implementing gitweb output caching

Jonathan Nieder <jrnieder@xxxxxxxxx> · Fri, 7 Jan 2011 18:26:43 -0600

Hi,

Thanks for these design notes.  A few uninformed reactions.

Jakub Narebski wrote:

> There was request to support installing gitweb modules in a separate
> directory, but that would require changes to "gitweb: Prepare for
> splitting gitweb" patch (but it is doable).  Is there wider interest
> in supporting such feature?

If you are referring to my suggestion, I see no reason to wait on
that.  The lib/ dir can be made configurable later.

> Simplest solution is to use $cgi->self_url() (note that what J.H. v8
> uses, i.e.: "$my_url?". $ENV{'QUERY_STRING'}, where $my_url is just
> $cgi->url() is not enough - it doesn't take path_info into account).
>
> Alternate solution, which I used in my rewrite, is to come up with
> "canonical" URL, e.g. href(-replay => 1, -full => 1, -path_info => 0);
> with this solution using path_info vs query parameters or reordering
> query parameters still gives the same key.

It is easy to miss dependencies on parts of the URL that are being
fuzzed out.  For example, the <base href...> tag is only inserted with
path_info.  Maybe it would be less risky to first use self_url(), then
canonicalize it in a separate patch?

> J.H. patches up and including v7, and my rewrite up and including v6,
> excluded error pages from caching.  I think that the original resoning
> behind choosing to do it this way was that A.), each of specific error
> pages is usually accessed only once, so caching them would only take up
> space bloating cache, but what is more important B.) that you can't
> cache errors from caching engine.

Perhaps there is a user experience reason?  If I receive an error page
due to a problem with my repository, all else being equal, I would
prefer that the next time I reload it is fixed.  By comparison, having
to reload multiple times to forget an obsolete non-error response
would be less aggravating and perhaps expected.

But the benefit from caching e.g. a response from a broken link would
outweigh that.

> Second is if there is no stale data to serve (or data is too stale), but
> we have progress indicator.  In this case the foreground process is
> responsible for rendering progress indicator, and background process is
> responsible for generating data.  In this case foreground process waits
> for data to be generated (unless progress info subroutine exits), so
> strictly spaking we don't need to detach background process in this
> case.

What happens when the client gets tired of waiting and closes the
connection?

> With output caching gitweb can also support 'Range' requests, which
> means that it would support resumable download.  This would mean hat we
> would be able to resume downloading of snapshot (or in the future
> bundle)... if we cannot do this now.  This would require some more code
> to be added.

Exciting stuff.

Teaching gitweb to generate bundles sounds like a recipe for high server
loads, though.  I suspect manual (or by cronjob) generation would work
better, with a possible exception of very frequently cloned and
infrequently pushed-to repos like linus's linux-2.6.

Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html