git-blame extremely slow in partial clones due to serial object fetching

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When running git-blame in a partial clone (--filter=blob:none), it fetches
missing blob objects one at a time. This can result in thousands of serial fetch
operations, making blame extremely slow, regardless of network latency.

For example, in one large repository, blaming a single large file required 
fetching about 6500 objects. Each fetch requiring a round-trip means this 
operation would have taken something on the order of an hour to complete.

The core issue appears to be in fill_origin_blob(), which is called
individually for each blob needed during the blame process. While the blame
algorithm does need blob contents to make detailed line-matching decisions,
it seems like we don't necessarily need the contents just to determine which 
blobs we'llexamine.

It seems like this could be optimized by batch-fetching the needed objects
upfront, rather than fetching them one at a time. This would convert O(n)
round-trips into a small number of batch fetches.

Reproduction:
1. Create a partial clone with --filter=blob:none
2. Run git blame on a file with significant history
3. Observe serial fetching of objects in the trace output

Let me know if you need any additional information to investigate this issue.

—burke




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux