Re: git blame <directory> [was: Reducing CPU load on git server]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 30, 2016 at 12:46:20PM +0200, Jakub Narębski wrote:

> W dniu 29.08.2016 o 23:31, Jeff King pisze:
> 
> > Blame-tree is a GitHub-specific command (it feeds the main repository
> > view page), and is a known CPU hog. There's more clever caching for that
> > coming down the pipe, but it's not shipped yet.
> 
> I wonder if having support for 'git blame <directory>' in Git core would
> be something interesting to Git users.  I once tried to implement it,
> but it went nowhere.  Would it be hard to implement?

I think there's some interest; I have received a few off-list emails
over the years about it. There was some preliminary discussion long ago:

  http://public-inbox.org/git/20110302164031.GA18233@xxxxxxxxxxxxxxxxxxxxx/

The code that runs on GitHub is available in my fork of git. I haven't
submitted it upstream because there are some lingering issues. I
mentioned them on-list in the first few items of:

  http://public-inbox.org/git/20130318121243.GC14789@xxxxxxxxxxxxxxxxxxxxx/

That code is in the jk/blame-tree branch of https://github.com/peff/git
if you are interested in addressing them (note that I haven't touched
that code in a few years except for rebasing it forward, so it may have
bitrotted a little).

Here's a snippet from an off-list conversation I had with Dennis (cc'd)
in 2014 (I think he uses that blame-tree code as part of a custom git
web interface):

> The things I think it needs are:
> 
>   1. The max-depth patches need to be reconciled with Duy's pathspec
>      work upstream. The current implementation works only for tree
>      diffs, and is not really part of the pathspec at all.
> 
>   2. Docs/output formats for blame-tree need to be friendlier, as you
>      noticed.
> 
>   3. Blame-tree does not use revision pathspecs at all. This makes it
>      take way longer than it could, because it does not prune away side
>      branches deep in history that affect only paths whose blame we have
>      already found. But the current pathspec code is so slow that using
>      it outweighs the pruning benefit.
> 
>      I have a series, which I just pushed up to jk/faster-blame-tree,
>      which tries to improve this.  But it's got a lot of new, untested
>      code itself (we are not even running it at GitHub yet). It's also
>      based on v1.9.4; I think there are going to be a lot of conflicts
>      with the combine-tree work done in v2.0.
> 
> [...]
> 
> I also think it would probably make sense for blame-tree to support the
> same output formats as git-blame (e.g., to have an identical --porcelain
> mode, to have a reasonable human-readable format by default, etc).

That's all I could dig out of my archives. I'd be happy if somebody
wanted to pick it up and run with it. Polishing for upstream has been on
my list for several years now, but there's usually something more
important (or interesting) to work on at any given moment.

You might also look at how GitLab does it. I've never talked to them
about it, and as far as I know they do not use blame-tree.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]