Re: git-blame extremely slow in partial clones due to serial object fetching

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 21 Nov 2024 07:55:24 +0900

Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:

> Technically, we do need the contents, ...
> There are other ways:
>
>  - If we can teach the client to collect object IDs for prefetching,
>    perhaps it would be just as easy to teach the server. We could
>    instead make filter-by-path an acceptable argument to pass to "fetch
>    --filter", then teach the lazy fetch to use that argument. This also
>    opens the door to future performance improvements - since the server
>    has all the objects, it can give us precisely the objects that we
>    need, and not just give us a quantity of objects based on a heuristic
>    (so the client does not need to say "give me 10, and if I need more,
>    I'll ask you again", but can say "give me all I need to complete
>    the blame). This, however, relies on server implementers to implement
>    and turn on such a feature.

This is an interesting half-way point, but I have a suspicion that
in order for the server side to give you all you need, the server
side has to do something close to computing the full blame.  Start
from a child commit plus the entire file as input, find blocks of
text in that entire file that are different in its parent (these are
the lines that are "blamed" to the child commit), pass control to
the same algorithm but using the parent commit plus the remainder of
the file (excluding the lines of text that have already "blamed") as
the input, rinse and repeat, until the "remainder of the file"
shrinks to empty.  When everything is "blamed", you know you can
stop.

So, a server that can give you something better than a heuristic
would have spent enough cycles to know the final result of "blame"
by the time it knows where it should/can stop, wouldn't it?

>  - We could also teach the server to "blame" a file for us and then
>    teach the client to stitch together the server's result with the
>    local findings, but this is more complicated.

Your local lazy repository, if you have anything you have to "stitch
together", would have your locally modified contents, and for you to
be able to make such modifications, it would also have at least the
blobs from HEAD, which you based your modifications on.  So you
should be able to locally run "git blame @{u}.." to find lines that
your locally modified contents are to be blamed, ask the other side
to give you a blame for @{u}, and overlay the former on top of the
latter.