On 7 June 2013 10:57, Fredrik Gustafsson <iveqy@xxxxxxxxx> wrote: > On Fri, Jun 07, 2013 at 10:05:37AM -0700, Constantine A. Murenin wrote: >> On 6 June 2013 23:33, Fredrik Gustafsson <iveqy@xxxxxxxxx> wrote: >> > On Thu, Jun 06, 2013 at 06:35:43PM -0700, Constantine A. Murenin wrote: >> >> I'm interested in running a web interface to this and other similar >> >> git repositories (FreeBSD and NetBSD git repositories are even much, >> >> much bigger). >> >> >> >> Software-wise, is there no way to make cold access for git-log and >> >> git-blame to be orders of magnitude less than ~5s, and warm access >> >> less than ~0.5s? >> > >> > The obvious way would be to cache the results. You can even put an >> >> That would do nothing to prevent slowness of the cold requests, which >> already run for 5s when completely cold. >> >> In fact, unless done right, it would actually slow things down, as >> lines would not necessarily show up as they're ready. > > You need to cache this _before_ the web-request. Don't let the > web-request trigger a cache-update but a git push to the repository. > >> >> > update cache hook the git repositories to make the cache always be up to >> > date. >> >> That's entirely inefficient. It'll probably take hours or days to >> pre-cache all the html pages with a naive wget and the list of all the >> files. Not a solution at all. >> >> (0.5s x 35k files = 5 hours for log/blame, plus another 5h of cpu time >> for blame/log) > > That's a one-time penalty. Why would that be a problem? And why is wget > even mentioned? Did we misunderstood eachother? `wget` or `curl --head` would be used to trigger the caching. I don't understand how it's a one-time penalty. Noone wants to look at an old copy of the repository, so, pretty much, if, say, I want to have a gitweb of all 4 BSDs, updated daily, then, pretty much, even with lots of ram (e.g. to eliminate the cold-case 5s penalty, and reduce each page to 0.5s), on a quad-core box, I'd be kinda be lucky to complete a generation of all the pages within 12h or so, obviously using the machine at, or above, 50% capacity just for the caching. Or several days or even a couple of weeks on an Intel Atom or VIA Nano with 2GB of RAM or so. Obviously not acceptable, there has to be a better solution. One could, I guess, only regenerate the pages which have changed, but it still sounds like an ugly solution, where you'd have to be generating a list of files that have changed between one gen and the next, and you'd still have to have a very high cpu, cache and storage requirements. C. >> > There's some dynamic web frontends like cgit and gitweb out there but >> > there's also static ones like git-arr ( http://blitiri.com.ar/p/git-arr/ >> > ) that might be more of an option to you. >> >> The concept for git-arr looks interesting, but it has neither blame >> nor log, so, it's kinda pointless, because the whole thing that's slow >> is exactly blame and log. >> >> There has to be some way to improve these matters. Noone wants to >> wait 5 seconds until a page is generated, we're not running enterprise >> software here, latency is important! >> >> C. > > Git's internal structures make just blame pretty expensive. There's > nothing you really can do for it algoritm wise (as far as I know, if > there was, people would already improved it). > > The solution here is to have a "hot" repository to speed up things. > > There's of course little things you can do. I imagine that using git > repack in a sane way probably could speed things up, as well as git gc. > > -- > Med vänliga hälsningar > Fredrik Gustafsson > > tel: 0733-608274 > e-post: iveqy@xxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html