[RFC] Packing large repositories

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I just started experimenting with using git on
a large engineering project which has used p4 so far.
Part of a checkout is about 55GB;
after an initial commit and packing I have a 20GB+ packfile.
Of course this is unusable, since object_entry's in an .idx
file have only 32 bits in their offset fields.  I conclude that
for such large projects,  git-repack/git-pack-objects would need
new options to control maximum packfile size.

[ I don't think this affects git-{fetch,receive,send}-pack
since apparently only the pack is transferred and it only uses
the variable-length size and delta base offset encodings
(of course the accumulation of the 7 bit chunks in a 32b
variable would need to be corrected, but at least the data
format doesn't change).]

So I am toying with adding a --limit <size> flag to git-repack/git-pack-objects.
This cannot be used with --stdout.  If specified, e.g.
 git-repack --limit 2g
then each packfile created could be at most 2^31-1 bytes in size.
It's possible that multiple packfiles would be created in one shot.
Thus git-pack-objects could write multiple names to stdout
and git-repack would need to be updated accordingly.

Finally, I wonder if having tree/commit/tag objects mixed into
such large packfiles would be a performance hit.
(Or maybe this will only appear once I have real history,
not just a large initial commit.  But I can say that I now have 48K
data blobs and 9K others.)
To find out, I may experiment with adding a --type=<types> option
to git-repack/git-pack-objects.  Thus typing
 git-repack --limit 2g --type=tree+commit+tag,blob
would cause git-pack-objects to make 2 passes over its internal
object list. On the first, it would pack tree, commit, and tag objects.
On the second, it would pack blobs. Each pass would write at
least one independent packfile (or more with --limit).  This would also
allow different incremental repacking strategies/schedules for different types.

Comments?

Thanks!
--
Dana L. How  danahow@xxxxxxxxx  +1 650 804 5991 cell
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]