Re: [External] Re: git-blame extremely slow in partial clones due to serial object fetching

Han Young <hanyang.tony@xxxxxxxxxxxxx> · Thu, 21 Nov 2024 11:12:12 +0800

On Thu, Nov 21, 2024 at 7:00 AM Junio C Hamano <gitster@xxxxxxxxx> wrote:
> >  - We could also teach the server to "blame" a file for us and then
> >    teach the client to stitch together the server's result with the
> >    local findings, but this is more complicated.
>
> Your local lazy repository, if you have anything you have to "stitch
> together", would have your locally modified contents, and for you to
> be able to make such modifications, it would also have at least the
> blobs from HEAD, which you based your modifications on.  So you
> should be able to locally run "git blame @{u}.." to find lines that
> your locally modified contents are to be blamed, ask the other side
> to give you a blame for @{u}, and overlay the former on top of the
> latter.
>

In $DAY_JOB, we modified the server to run blame for the client.
To deal with changes not yet pushed to the server, we let client
pack the local only blobs for the blamed file, alone with the local
only commits that touch that file into one packfile and send a
'remote-blame' request to the server.

Server then unpack the relevant objects into memory
(by reusing code from git-unpack-objects), run the blame and return
the result back to the client. This way we avoided running blame both
twice and interleave the results.

It works quite well in very large repos, with result caching, the speed
can be even faster than locally blame on a full repo.