Re: [PATCH 0/2] optimizing pack access on "read only" fetch repos

Jeff King <peff@xxxxxxxx> · Tue, 29 Jan 2013 03:29:40 -0500

On Sat, Jan 26, 2013 at 10:32:42PM -0800, Junio C Hamano wrote:

> Both makes sense to me.
> 
> I also wonder if we would be helped by another "repack" mode that
> coalesces small packs into a single one with minimum overhead, and
> run that often from "gc --auto", so that we do not end up having to
> have 50 packfiles.
> 
> When we have 2 or more small and young packs, we could:
> 
>  - iterate over idx files for these packs to enumerate the objects
>    to be packed, replacing read_object_list_from_stdin() step;
> 
>  - always choose to copy the data we have in these existing packs,
>    instead of doing a full prepare_pack(); and
> 
>  - use the order the objects appear in the original packs, bypassing
>    compute_write_order().

I'm not sure. If I understand you correctly, it would basically just be
concatenating packs without trying to do delta compression between the
objects which are ending up in the same pack. So it would save us from
having to do (up to) 50 binary searches to find an object in a pack, but
would not actually save us much space.

I would be interested to see the timing on how quick it is compared to a
real repack, as the I/O that happens during a repack is non-trivial
(although if you are leaving aside the big "main" pack, then it is
probably not bad).

But how do these somewhat mediocre concatenated packs get turned into
real packs? Pack-objects does not consider deltas between objects in the
same pack. And when would you decide to make a real pack? How do you
know you have 50 young and small packs, and not 50 mediocre coalesced
packs?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html