Re: is there a fast web-interface to git for huge repos?

"Constantine A. Murenin" <mureninc@xxxxxxxxx> · Fri, 7 Jun 2013 12:02:36 -0700

On 7 June 2013 10:57, Fredrik Gustafsson <iveqy@xxxxxxxxx> wrote:
> On Fri, Jun 07, 2013 at 10:05:37AM -0700, Constantine A. Murenin wrote:
>> On 6 June 2013 23:33, Fredrik Gustafsson <iveqy@xxxxxxxxx> wrote:
>> > On Thu, Jun 06, 2013 at 06:35:43PM -0700, Constantine A. Murenin wrote:
>> >> I'm interested in running a web interface to this and other similar
>> >> git repositories (FreeBSD and NetBSD git repositories are even much,
>> >> much bigger).
>> >>
>> >> Software-wise, is there no way to make cold access for git-log and
>> >> git-blame to be orders of magnitude less than ~5s, and warm access
>> >> less than ~0.5s?
>> >
>> > The obvious way would be to cache the results. You can even put an
>>
>> That would do nothing to prevent slowness of the cold requests, which
>> already run for 5s when completely cold.
>>
>> In fact, unless done right, it would actually slow things down, as
>> lines would not necessarily show up as they're ready.
>
> You need to cache this _before_ the web-request. Don't let the
> web-request trigger a cache-update but a git push to the repository.
>
>>
>> > update cache hook the git repositories to make the cache always be up to
>> > date.
>>
>> That's entirely inefficient.  It'll probably take hours or days to
>> pre-cache all the html pages with a naive wget and the list of all the
>> files.  Not a solution at all.
>>
>> (0.5s x 35k files = 5 hours for log/blame, plus another 5h of cpu time
>> for blame/log)
>
> That's a one-time penalty. Why would that be a problem? And why is wget
> even mentioned? Did we misunderstood eachother?

`wget` or `curl --head` would be used to trigger the caching.

I don't understand how it's a one-time penalty.  Noone wants to look
at an old copy of the repository, so, pretty much, if, say, I want to
have a gitweb of all 4 BSDs, updated daily, then, pretty much, even
with lots of ram (e.g. to eliminate the cold-case 5s penalty, and
reduce each page to 0.5s), on a quad-core box, I'd be kinda be lucky
to complete a generation of all the pages within 12h or so, obviously
using the machine at, or above, 50% capacity just for the caching.  Or
several days or even a couple of weeks on an Intel Atom or VIA Nano
with 2GB of RAM or so.  Obviously not acceptable, there has to be a
better solution.

One could, I guess, only regenerate the pages which have changed, but
it still sounds like an ugly solution, where you'd have to be
generating a list of files that have changed between one gen and the
next, and you'd still have to have a very high cpu, cache and storage
requirements.

C.

>> > There's some dynamic web frontends like cgit and gitweb out there but
>> > there's also static ones like git-arr ( http://blitiri.com.ar/p/git-arr/
>> > ) that might be more of an option to you.
>>
>> The concept for git-arr looks interesting, but it has neither blame
>> nor log, so, it's kinda pointless, because the whole thing that's slow
>> is exactly blame and log.
>>
>> There has to be some way to improve these matters.  Noone wants to
>> wait 5 seconds until a page is generated, we're not running enterprise
>> software here, latency is important!
>>
>> C.
>
> Git's internal structures make just blame pretty expensive. There's
> nothing you really can do for it algoritm wise (as far as I know, if
> there was, people would already improved it).
>
> The solution here is to have a "hot" repository to speed up things.
>
> There's of course little things you can do. I imagine that using git
> repack in a sane way probably could speed things up, as well as git gc.
>
> --
> Med vänliga hälsningar
> Fredrik Gustafsson
>
> tel: 0733-608274
> e-post: iveqy@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html