Re: [PATCH] Prevent megablobs from gunking up git packs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dana How wrote:
> On 5/22/07, Jakub Narebski <jnareb@xxxxxxxxx> wrote:
>> Dana How wrote:

>>> There's actually an even more extreme example from my day job.
>>> The software team has a project whose files/revisions would be
>>> similar to those in the linux kernel (larger commits, I'm sure).
>>> But they have *ONE* 500MB file they check in because it takes
>>> 2 or 3 days to generate and different people use different versions of it.
>>> I'm sure it has 50+ revisions now. If they converted to git and included
>>> these blobs in their packfile, that's a 25GB uncompressed increase!
>>> *Every* git operation must wade through 10X -- 100X more packfile.
>>> Or it could be kept in 50+ loose objects in objects/xx ,
>>> requiring a few extra syscalls by each user to get a new version.
>>
>> Or keeping those large objects in separate, _kept_ packfile, containing
>> only those objects (which can delta well, even if they are large).
> 
> Yes, I experimented with various changes to git-repack and
> having it create .keep files just before coming up with the maxblobsize
> approach.  The problem with a 12GB+ repo is not only the large
> repack time,  but the fact that the repack time keeps growing with
> the repo size.  So, with split packs, I had repack create .keep
> files for all new packs except the last (fragmentary) one.  The next
> repack would then only repack new stuff plus the single fragmentary
> pack, keeping repack time from growing (until you deleted the .keep
> files [just the ones with "repack" in them] to start over from scratch).
> But this approach is not going to distribute commits and trees all that well.

No, I was thinking about separate _kept_ pack (so it would be not 
repacked unless -f option is given) containing _only_ the large blobs.
The only difference between this and your proposal is that megablobs
would be in their mergablobs pack, but not loose.

-- 
Jakub Narebski
Poland
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux