Re: Reduce pack-objects memory footprint?

Duy Nguyen <pclouds@xxxxxxxxx> · Fri, 2 Mar 2018 21:38:34 +0700

On Fri, Mar 2, 2018 at 5:54 PM, Jeff King <peff@xxxxxxxx> wrote:
> On Fri, Mar 02, 2018 at 05:18:45PM +0700, Duy Nguyen wrote:
>
>> On Wed, Feb 28, 2018 at 4:27 PM, Duy Nguyen <pclouds@xxxxxxxxx> wrote:
>> > linux-2.6.git current has 6483999 objects. "git gc" on my poor laptop
>> > consumes 1.7G out of 4G RAM, pushing lots of data to swap and making
>> > all apps nearly unusuable (granted the problem is partly Linux I/O
>> > scheduler too). So I wonder if we can reduce pack-objects memory
>> > footprint a bit.
>>
>> Next low hanging fruit item:
>>
>> struct revindex_entry {
>>         off_t offset;
>>         unsigned int nr;
>> };
>>
>> We need on entry per object, so 6.5M objects * 16 bytes = 104 MB. If
>> we break this struct apart and store two arrays of offset and nr in
>> struct packed_git, we save 4 bytes per struct, 26 MB total.
>>
>> It's getting low but every megabyte counts for me, and it does not
>> look like breaking this struct will make horrible code (we recreate
>> the struct at find_pack_revindex()) so I'm going to do this too unless
>> someone objects. There will be slight performance regression due to
>> cache effects, but hopefully it's ok.
>
> Maybe you will prove me wrong, but I don't think splitting them is going
> to work. The point of the revindex_entry is that we sort the (offset,nr)
> tuple as a unit.
>
> Or are you planning to sort it, and then copy the result into two
> separate arrays? I think that would work, but it sounds kind of nasty
> (arcane code, and extra CPU work for systems that don't care about the
> 26MB).

How about two level lookups? We have

struct revindex_entry_l2 {
        uint32_t offset; /* the lowest 32 bits */
        uint32_t nr;
};

struct revindex {
        struct revindex_entry *level1[256]; /* 8 high bits */
};

This limits us to 1024GB pack files, which should give us some time
before we have to worry about it again and most of the time we'll need
just one or two items in level1[] so cache effects are not that bad.
Preparing/Sorting this could be a problem though.
-- 
Duy