Re: [PATCH] Prevent megablobs from gunking up git packs

Junio C Hamano <junkio@xxxxxxx> · Wed, 23 May 2007 15:08:25 -0700

Dana How <danahow@xxxxxxxxx> writes:

> This patch implements the following:
> 1. git pack-objects takes a new --max-blob-size=N flag,
>    with the effect that only blobs less than N KB are written
>    to the packfiles(s).  If a blob was in a pack but violates
>    this limit (perhaps the packs were created by fast-import
>    or max-blob-size was reduced),  then a new loose object
>    is written out if needed so the data is not lost.

Why?

I really do not like that "write a new loose object" part
without proper justification.  From your description, I thought
the most natural way to do this is to pretend you did not hear
about large objects at all, by rejecting them early, perhaps
inside add_object_entry() or inside get_object_details() --
either case you would do sha1_object_info() early instead of
doing it in check_object().

By the way, is there fundamental reason that this needs to be
"blob size" limit?  Wouldn't "max-object-size" be more clean in
theory, and work the same way in practice?

> 2. git repack inspects repack.maxblobsize .  If set,  its
>    value is passed to git pack-objects on the command line.
>    The user should change repack.maxblobsize ,  NOT specify
>    --max-blob-size=N .

Why not?

> This patch is on top of the earlier max-pack-size patch,
> because I thought I needed some behavior it supplied,
> but could be rebased on master if desired.

Your earlier "split according to max-pack-size" will hopefully be
on master shortly.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html