Re: [PATCHv3 4/9] pack-objects: Teach new option --max-object-count, similar to --max-pack-size

Johan Herland <johan@xxxxxxxxxxx> · Mon, 16 May 2011 11:49:45 +0200

On Monday 16 May 2011, Junio C Hamano wrote:
> Shawn Pearce <spearce@xxxxxxxxxxx> writes:
> > On Sun, May 15, 2011 at 14:37, Johan Herland <johan@xxxxxxxxxxx> wrote:
> >> The new --max-object-count option behaves similarly to
> >> --max-pack-size, except that the decision to split packs is
> >> determined by the number of objects in the pack, and not by the size
> >> of the pack.
> > 
> > Like my note about pack size for this case... I think doing this
> > during writing is too late. We should be aborting the counting phase
> > if the output pack is to stdout and we are going to exceed this limit.
> 
> Well, even more important is if this is even useful. What is the user
> trying to prevent from happening, and is it a useful thing?

I initially met this problem at $dayjob, where a user would push a large 
pack from an unrelated repo, and the push would be refused by our update 
hook, but not without storing the pack on the server first. Obviously, the 
main objective of the patch is to prevent the pack from being stored on the 
server in the first place. [1]

Based on that, I do not really care exactly what kind of limit I have to set 
in order to prevent the push. I can easily enough find usable thresholds in 
any of #objects, #commits or pack size.

> I think "do not hog too much disk" (i.e. size) is an understandable wish,
> and max-pack-size imposed on --stdout would be a good approximation for
> that.

Agreed.

> I would understand "this project has only these files, and pushing a tree
> that has 100x leaves than that may be a mistake" (i.e. recursive sum of
> number of entries of an individual tree). I would also sort-of understand
> "do not push too deep a history at once" (i.e. we do not welcome pushing
> a wildly diverged fork that has been allowed to grow for too long).
> 
> But I do not think max-object-count is a good approximation for either
> to be useful.

I agree in principle, but currently, it's the only limit that we can enforce 
on the server side.

> Without a good answer to the above question, this looks like a "because
> we could", not "because it is useful", feature.

Ok, I can drop the client side part of this patch (including the limit-
object-count capability), but I would like to retain the server side part 
(abort if pack header reveals too many objects), since otherwise I have to 
wait for all users to upgrade to a "limit-*"-aware client before I can 
consider this problem truly solved.

...Johan

[1]: There is a followup to this problem, where a subsequent 'git gc' will 
find all the objects in the pushed (but rejected by update-hook) pack to be 
unreferenced, and then explode them into loose objects. In one case, a user 
pushed a 300 MB pack to the server that was then exploded into 5 GB of loose 
objects by a subsequent 'git gc'. Needless to say, we now run 'git gc --
prune=now' as a workaround until this can be fixed.

-- 
Johan Herland, <johan@xxxxxxxxxxx>
www.herland.net
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html