Re: is there a fast web-interface to git for huge repos?

"Constantine A. Murenin" <mureninc@xxxxxxxxx> · Fri, 7 Jun 2013 13:21:30 -0700

On 7 June 2013 13:13, Charles McGarvey <chazmcgarvey@xxxxxxxxxxxxxxxx> wrote:
> On 06/07/2013 01:02 PM, Constantine A. Murenin wrote:
>>> That's a one-time penalty. Why would that be a problem? And why is wget
>>> even mentioned? Did we misunderstood eachother?
>>
>> `wget` or `curl --head` would be used to trigger the caching.
>>
>> I don't understand how it's a one-time penalty.  Noone wants to look
>> at an old copy of the repository, so, pretty much, if, say, I want to
>> have a gitweb of all 4 BSDs, updated daily, then, pretty much, even
>> with lots of ram (e.g. to eliminate the cold-case 5s penalty, and
>> reduce each page to 0.5s), on a quad-core box, I'd be kinda be lucky
>> to complete a generation of all the pages within 12h or so, obviously
>> using the machine at, or above, 50% capacity just for the caching.  Or
>> several days or even a couple of weeks on an Intel Atom or VIA Nano
>> with 2GB of RAM or so.  Obviously not acceptable, there has to be a
>> better solution.
>>
>> One could, I guess, only regenerate the pages which have changed, but
>> it still sounds like an ugly solution, where you'd have to be
>> generating a list of files that have changed between one gen and the
>> next, and you'd still have to have a very high cpu, cache and storage
>> requirements.
>
> Have you already ruled out caching on a proxy?  Pages would only be generated
> on demand, so the first visitor would still experience the delay but the rest
> would be fast until the page expires.  Even expiring pages as often as five
> minutes or less would probably provide significant processing savings
> (depending on how many users you have), and that level of staleness and the
> occasional delays may be acceptable to your users.
>
> As you say, generating the entire cache upfront and continuously is wasteful
> and probably unrealistic, but any type of caching, by definition, is going to
> involve users seeing stale content, and I don't see that you have any other
> option but some type of caching.  Well, you could reproduce what git does in a
> bunch of distributed algorithms and run your app on a farm--which, I guess, is
> probably what GitHub is doing--but throwing up a caching reverse proxy is a
> lot quicker if you can accept the caveats.

I don't think GitHub / Gitorious / whatever have solved this problem
at all.  They're terribly slow on big repos, some pages don't even
generate the first time you click on the link.

I'm totally fine with daily updates; but I think there still has to be
some better way of doing this than wasting 0.5s of CPU time and 5s of
HDD time (if completely cold) for each blame / log, at the price of
more storage and some pre-caching, and (daily (in my use-case))
fine-grained incremental updates.

C.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html