Re: [RFC] Implementing gitweb output caching - issues to solve

Jonathan Nieder <jrnieder@xxxxxxxxx> · Thu, 9 Dec 2010 22:11:44 -0600

Olaf Alders wrote:
> On 2010-12-09, at 5:52 PM, Jonathan Nieder wrote:

>> HTTP::BrowserDetect uses a blacklist as far as I can tell.  Maybe in
>> the long term it would be nice to add a whitelist ->human() method.
>>
>> Cc-ing Olaf Alders for ideas.
>
> Thanks for including me in this.  :)  I'm certainly open to patching
> the module, but I'm not 100% clear on how  you would want to
> implement this.  How is ->is_human different from !->is_robot?  To
> clarify, I should say that from the snippet above, I'm not 100%
> clear on what the problem is which needs to be solved.

Context (sorry I did not include this in the first place):

The caching code (in development) for git's web interface uses a page
that says "Generating..." for cache misses, with an http refresh
redirecting to the generated content.  The big downside is that if
done naively this breaks wget, curl, and similar user agents that are
not patient enough to grab the actual content instead of the redirect
page.

The first solution tried was to explicitly special case wget and curl.
But in this case it is better to be more inclusive[2]; when in doubt,
leave out the nice "Generating..." page and just serve the actual
content slowly just in case.

In other words, the idea was that user agents fall into three
categories:

 A. definitely will not replace content with target of HTTP refresh
 B. definitely will replace content with target of HTTP refresh
 C. unknown

and maybe ->is_robot could return true for A and ->is_human return
true for B (leaving C as !->is_human && !->is_robot).  In this case,
we should show the "Generating..." page only in the ->is_human (B)
case.

That said, I know almost nothing on this subject, so it is likely
this analysis misses something.  J.H. or Jakub can likely say more.

Thanks,
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html