Re: [PATCH 0/2] optimizing pack access on "read only" fetch repos

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King <peff@xxxxxxxx> writes:

>> I also wonder if we would be helped by another "repack" mode that
>> coalesces small packs into a single one with minimum overhead, and
>> run that often from "gc --auto", so that we do not end up having to
>> have 50 packfiles.
>> ...
>
> I'm not sure. If I understand you correctly, it would basically just be
> concatenating packs without trying to do delta compression between the
> objects which are ending up in the same pack. So it would save us from
> having to do (up to) 50 binary searches to find an object in a pack, but
> would not actually save us much space.

The point is not about space.  Disk is cheap, and it is not making
it any worse than what happens to your target audience, that is a
fetch-only repository with only "gc --auto" in it, where nobody
passes "-f" to "repack" to cause recomputation of delta.

What I was trying to seek was a way to reduce the runtime penalty we
pay every time we run git in such a repository.

 - Object look-up cost will become log2(50*n) from 50*log2(n), which
   is about 50/log2(50) improvement;

 - System resource cost we incur by having to keep 50 file
   descriptors open and maintaining 50 mmap windows will reduce by
   50 fold.

 - Anything else I missed?

> I would be interested to see the timing on how quick it is compared to a
> real repack,...

Yes, that is what I meant by "wonder if we would be helped by" ;-)

> But how do these somewhat mediocre concatenated packs get turned into
> real packs?

How do they get processed in a fetch-only repositories that
sometimes run "gc --auto" today?  By runninng "repack -a -d -f"
occasionally, perhaps?

At some point, you would need to run a repack that involves a real
object-graph traversal that feeds you the paths for objects to
obtain a reasonably compressed pack.  We can inspect existing packs
and guess a rough traversal order for commits, but for trees and
blobs, you cannot unify existing delta chains from multiple packs
effectively with data in the pack alone.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]