On 5/25/07, Junio C Hamano <junkio@xxxxxxx> wrote:
Dana How <danahow@xxxxxxxxx> writes: > Nicolas Pitre wrote: >> I wouldn't mind a _separate_ tool that would load a pack index, >> determine object sizes from it, and then extract big objects to write >> them as loose objects ... > > Below we add two new options to git-unpack-objects: > > --min-blob-size=<n>:: Unpacking is only done for objects > larger than or equal to n kB (uncompressed size by Junio). Elsewhere you wanted to use --max-* and that was counted in megs; isn't using kilo here and meg there inconsistent?
For git-repack --max-pack-size=N I used MB to be consistent with git-fast-import. I think it makes sense to use MB everywhere we talk about packfile size. For the old degunking patch's -max-blob-size=N , and this patch's -min-blob-size=N , I was using KB to describe blob size. It looked like I needed finer granularity, at least for experiments. I think if we always use MB for packfile sizes, and KB for blob sizes, we should be OK.
> --force:: Loose objects will be created even if they > already exist in the repository packed. This is an option > I've wanted before for other reasons. ... but if they already exist in the repository as loose objects, do not replace it. Usually we do not overwrite existing loose objects and it is one of the security measure --- if you have an object already, that cannot be touched by somebody who maliciously creats a hash colliding loose object and tries to inject it into your repository via unpack-objects. It's good that you kept this behaviour intact.
I agree. I'll add the "... but" part to the corrected patch's commit msg.
> -Objects that already exist in the repository will *not* be unpacked > -from the pack-file. Therefore, nothing will be unpacked if you use > -this command on a pack-file that exists within the target repository. > +By default, objects that already exist in the repository will *not* > +be unpacked from the pack-file. Therefore, nothing will be unpacked > +if you use this command on a pack-file that exists within the target > +repository, unless you specify --force. I would want to add: If an object already exists unpacked in the repository, it will not be replaced with the copy from the pack, with or without `--force`.
OK, that's clearer.
> -static const char unpack_usage[] = "git-unpack-objects [-n] [-q] [-r] < pack-file"; > +static const char unpack_usage[] = > +"git-unpack-objects [-n] [-q] [-r] [--force] [--min-blob-size=N] < pack-file"; Maybe we would want to call it '-f' for consistency. Another possibility is the other way around, giving others a longer synonyms, like --quiet, but this command is plumbing and I do not think long options matters that much, so my preference is to do '-f' not '--force'.
I picked the longer one only because I didn't view it as frequently used. But it will be more used than min-blob-size. I'll change it to -f.
* You already have the size here, so if min_blob_size is set and the size is larger, you do not even have to call write_sha1_file() at all.
The way I read the code, it looks like unpack-objects needs the last argument always to be initialized with the SHA-1 computed from the object contents. Therefore I always need to call write_sha1_file(), even if I don't want it to write anything.
So "ignore" means: negative: never write it out, even if it does not exist. zero: do not write it out if it is available (in pack, or loose, either local or alternate), do write it out otherwise; it is the same as the current behaviour of write_sha1_file(). positive: always write it out. That does not sound like "ignore".
I agree "ignore" is confusing; I will change it to "when", which is more consistent with the _maybe prefix on the function name.
My suggestion would be:
I like this suggestion, but since I need to call the function to get the SHA-1, I don't think I can follow it. I'll send you an updated patch in a moment. Thanks, -- Dana L. How danahow@xxxxxxxxx +1 650 804 5991 cell - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html