On Monday 16 May 2011, Junio C Hamano wrote: > Shawn Pearce <spearce@xxxxxxxxxxx> writes: > > On Sun, May 15, 2011 at 14:37, Johan Herland <johan@xxxxxxxxxxx> wrote: > >> The new --max-object-count option behaves similarly to > >> --max-pack-size, except that the decision to split packs is > >> determined by the number of objects in the pack, and not by the size > >> of the pack. > > > > Like my note about pack size for this case... I think doing this > > during writing is too late. We should be aborting the counting phase > > if the output pack is to stdout and we are going to exceed this limit. > > Well, even more important is if this is even useful. What is the user > trying to prevent from happening, and is it a useful thing? I initially met this problem at $dayjob, where a user would push a large pack from an unrelated repo, and the push would be refused by our update hook, but not without storing the pack on the server first. Obviously, the main objective of the patch is to prevent the pack from being stored on the server in the first place. [1] Based on that, I do not really care exactly what kind of limit I have to set in order to prevent the push. I can easily enough find usable thresholds in any of #objects, #commits or pack size. > I think "do not hog too much disk" (i.e. size) is an understandable wish, > and max-pack-size imposed on --stdout would be a good approximation for > that. Agreed. > I would understand "this project has only these files, and pushing a tree > that has 100x leaves than that may be a mistake" (i.e. recursive sum of > number of entries of an individual tree). I would also sort-of understand > "do not push too deep a history at once" (i.e. we do not welcome pushing > a wildly diverged fork that has been allowed to grow for too long). > > But I do not think max-object-count is a good approximation for either > to be useful. I agree in principle, but currently, it's the only limit that we can enforce on the server side. > Without a good answer to the above question, this looks like a "because > we could", not "because it is useful", feature. Ok, I can drop the client side part of this patch (including the limit- object-count capability), but I would like to retain the server side part (abort if pack header reveals too many objects), since otherwise I have to wait for all users to upgrade to a "limit-*"-aware client before I can consider this problem truly solved. ...Johan [1]: There is a followup to this problem, where a subsequent 'git gc' will find all the objects in the pushed (but rejected by update-hook) pack to be unreferenced, and then explode them into loose objects. In one case, a user pushed a 300 MB pack to the server that was then exploded into 5 GB of loose objects by a subsequent 'git gc'. Needless to say, we now run 'git gc -- prune=now' as a workaround until this can be fixed. -- Johan Herland, <johan@xxxxxxxxxxx> www.herland.net -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html