Re: [External] Re: git-blame extremely slow in partial clones due to serial object fetching

Shubham Kanodia <shubham.kanodia10@xxxxxxxxx> · Fri, 22 Nov 2024 09:02:08 +0530

On 21/11/24 8:42 am, Han Young wrote:
On Thu, Nov 21, 2024 at 7:00 AM Junio C Hamano <gitster@xxxxxxxxx> wrote:
  - We could also teach the server to "blame" a file for us and then
    teach the client to stitch together the server's result with the
    local findings, but this is more complicated.

Your local lazy repository, if you have anything you have to "stitch
together", would have your locally modified contents, and for you to
be able to make such modifications, it would also have at least the
blobs from HEAD, which you based your modifications on.  So you
should be able to locally run "git blame @{u}.." to find lines that
your locally modified contents are to be blamed, ask the other side
to give you a blame for @{u}, and overlay the former on top of the
latter.

In $DAY_JOB, we modified the server to run blame for the client.
To deal with changes not yet pushed to the server, we let client
pack the local only blobs for the blamed file, alone with the local
only commits that touch that file into one packfile and send a
'remote-blame' request to the server.

Server then unpack the relevant objects into memory
(by reusing code from git-unpack-objects), run the blame and return
the result back to the client. This way we avoided running blame both
twice and interleave the results.

It works quite well in very large repos, with result caching, the speed
can be even faster than locally blame on a full repo.

In a large sized partially cloned repo that I have, a `git blame` can 
take several minutes and network roundtrips.

Junio — would it make sense to add an option (and config) for `git 
blame` that limits how far back it looks for fetching blobs? This would 
prevent someone accidently firing several cascading calls as they open 
new files in an editor that does git blame by default (IntelliJ) or 
popular plugins (GitLens for VSCode) that can startup multiple heavy git 
processes and bring a user's system to a crawl.