Re: [PATCH 06/15] run-job: auto-size or use custom pack-files batch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/30/2020 12:48 PM, Son Luong Ngoc wrote:
> Hi Derrick,
> 
> I have been reviewing these jobs' mechanics closely and have some questions:
> 
>> The dynamic default size is computed with this idea in mind for
>> a client repository that was cloned from a very large remote: there
>> is likely one "big" pack-file that was created at clone time. Thus,
>> do not try repacking it as it is likely packed efficiently by the
>> server. Instead, try packing the other pack-files into a single
>> pack-file.
>>
>> The size is then computed as follows:
>>
>> batch size = total size - max pack size
> 
> Could you please elaborate why is this the best value?

The intention was to repack everything _except_ the biggest pack,
but clearly that doesn't always work. There is some logic to "guess"
the size of the resulting pack that doesn't always reach the total
batch size, so nothing happens. More investigation is required here.

> In practice I have been testing this out with the following
> 
>> % cat debug.sh
>> #!/bin/bash
>>
>> temp=$(du -cb .git/objects/pack/*.pack)
>>
>> total_size=$(echo "$temp" | grep total | awk '{print $1}')
>> echo total_size
>> echo $total_size
>>
>> biggest_pack=$(echo "$temp" | sort -n | tail -2 | head -1 | awk '{print $1}')
>> echo biggest pack
>> echo $biggest_pack
>>
>> batch_size=$(expr $total_size - $biggest_pack)
>> echo batch size
>> echo $batch_size
> 
> If you were to run
> 
>> git multi-pack-index repack --batch-size=$(./debug.sh | tail -1)
> 
> then nothing would be repack.> 
> Instead, I have had a lot more success with the following
> 
>> # Get the 2nd biggest pack size (in bytes) + 1
>> $(du -b .git/objects/pack/*pack | sort -n | tail -2 | head -1 | awk '{print $1}') + 1
> 
> I think you also used this approach in t5319 when you used the 3rd
> biggest pack size

The "second biggest pack" is an interesting approach. At first glance it
seems like we will stabilize with one big pack and many similarly-sized
packs. However, even a small deviation in size is inevitable and will
cause two or more packs to combine and create a "new second biggest"
pack.

> Looking forward to a re-roll of this RFC.

I do plan to submit a new version of the RFC, but it will look quite
different based on the feedback so far. I'm still digesting that
feedback and will take another attempt at it after I wrap up some other
items that have my attention currently.

Thanks!
-Stolee





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux