Re: [PATCH 00/20] pack-revindex: prepare for on-disk reverse index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 08, 2021 at 01:16:39PM -0500, Taylor Blau wrote:

> Generating the reverse index in memory for repositories with large packs has two
> significant drawbacks:
> 
>   - It requires allocating sizeof(struct revindex_entry) per packed object.
> 
>   - It requires us to sort the entries by their pack offset. This is implemented
>     in sort_revindex() using a radix sort, but still takes considerable time (as
>     benchmarks found in the second series demonstrate).

Or thinking about it more fundamentally: any operation which touches the
revindex is now O(nr_objects_in_repo), even if it only cares about a few
objects. Ideally this will eventually be this O(log nr_objects_in_repo);
we can't do much better than that because of object lookups (unless we
replace the .idx with a perfect hash or something).

> The goal of this series is to remove direct access of the `struct
> revindex_entry` type, as well as `struct packed_git`'s `revindex` field. The
> on-disk format will be mmap'd and accessed directly, but the format is
> sufficiently different that the whole `revindex` array can't be written as-is.

It looks good overall to me. I left a few nits around documentation and
integer types that I think are worth a re-roll, but I think after
addressing those it should be good.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux