Re: [PATCH 09/13] drop objects larger than --blob-limit if specified

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 5 Apr 2007, Dana How wrote:

> On 4/5/07, Nicolas Pitre <nico@xxxxxxx> wrote:
> > I still consider this feature to make no sense.
> 
> Well, suppose I'm packing my 55GB of data into 2GB
> packfiles.  There seemed to be some agreement that
> limiting packfile size was useful.  700MB is another example.
> 
> Now,  suppose there is an object whose packing would
> result in a packfile larger than the limit.  What should we do?

You error out.

> (1) Refuse to run.  This solution means I can't pack my repository.

Exactly.  If you want packs not to be larger than 10MB and you have a 
100MB blob then you are screwed.  Just lift your pack size limit in such 
case.

> (2) Pack the object any way and let the packfile size exceed
>     my specification.  Ignoring a clear preference from the user
>     doesn't seem good.

It is not indeed.

> (3) Pack the object by itself in its own pack. This is better than the
>     previous since I haven't wrapped up any small object in a pack
>     whose size I dont't want to deal with.  The resulting pack is too big,
>     but the original object was also too big so at least I haven't made
>     the problem worse.  But why bother wrapping the object so?
>     I just made the list of packs to look through longer for every access,
>     instead of leaving the big object in the objects/xx directories which
>     are already used to handle exceptions (usually meaning more recent).
>     In my 55GB example, I have 9 jumbo objects, and this solution
>     would more than double the number of packs to step through.
>     Having them randomly placed in 256 subdirectories seems better.

You forget about the case where those jumbo blobs could delta well 
against each other.  That means that one pack could possibly contain 
those 9 objects because 8 of them are tiny deltas against the first big 
one.

> (4) Just leave the jumbo object by itself, unpacked.

Hmmmmm.

> What do you think?

Let's say I wouldn't mind much if it was implemented differently.  The 
objects array is probably the biggest cost in terms of memory usage for 
pack-objects.  When you have 4 milions objects like in the kde repo that 
means each new field you add will cost between 4 to 16 MB of memory.  I 
think this is too big a cost for filtering out a couple big objects once 
in a while.

Instead, I think you should apply the filtering in add_object_entry() 
directly and simply skip adding the unwanted object to the list 
altogether.


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]