On Fri, Jan 08, 2021 at 01:16:39PM -0500, Taylor Blau wrote: > Generating the reverse index in memory for repositories with large packs has two > significant drawbacks: > > - It requires allocating sizeof(struct revindex_entry) per packed object. > > - It requires us to sort the entries by their pack offset. This is implemented > in sort_revindex() using a radix sort, but still takes considerable time (as > benchmarks found in the second series demonstrate). Or thinking about it more fundamentally: any operation which touches the revindex is now O(nr_objects_in_repo), even if it only cares about a few objects. Ideally this will eventually be this O(log nr_objects_in_repo); we can't do much better than that because of object lookups (unless we replace the .idx with a perfect hash or something). > The goal of this series is to remove direct access of the `struct > revindex_entry` type, as well as `struct packed_git`'s `revindex` field. The > on-disk format will be mmap'd and accessed directly, but the format is > sufficiently different that the whole `revindex` array can't be written as-is. It looks good overall to me. I left a few nits around documentation and integer types that I think are worth a re-roll, but I think after addressing those it should be good. -Peff