Re: [PATCH] Enhance unpack-objects for extracting large objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/25/07, Junio C Hamano <junkio@xxxxxxx> wrote:
Dana How <danahow@xxxxxxxxx> writes:
> Nicolas Pitre wrote:
>> I wouldn't mind a _separate_ tool that would load a pack index,
>> determine object sizes from it, and then extract big objects to write
>> them as loose objects ...
>
> Below we add two new options to git-unpack-objects:
>
> --min-blob-size=<n>::  Unpacking is only done for objects
> larger than or equal to n kB (uncompressed size by Junio).

Elsewhere you wanted to use --max-* and that was counted in megs;
isn't using kilo here and meg there inconsistent?
For git-repack --max-pack-size=N I used MB to be consistent
with git-fast-import.  I think it makes sense to use MB everywhere
we talk about packfile size.

For the old degunking patch's -max-blob-size=N ,  and this patch's
-min-blob-size=N ,  I was using KB to describe blob size.
It looked like I needed finer granularity,  at least for experiments.

I think if we always use MB for packfile sizes, and KB
for blob sizes,  we should be OK.

> --force::  Loose objects will be created even if they
> already exist in the repository packed.  This is an option
> I've wanted before for other reasons.

        ... but if they already exist in the repository as loose
        objects, do not replace it.

Usually we do not overwrite existing loose objects and it is one
of the security measure --- if you have an object already, that
cannot be touched by somebody who maliciously creats a hash
colliding loose object and tries to inject it into your
repository via unpack-objects.  It's good that you kept this
behaviour intact.
I agree.
I'll add the "... but" part to the corrected patch's commit msg.

> -Objects that already exist in the repository will *not* be unpacked
> -from the pack-file.  Therefore, nothing will be unpacked if you use
> -this command on a pack-file that exists within the target repository.
> +By default,  objects that already exist in the repository will *not*
> +be unpacked from the pack-file.  Therefore, nothing will be unpacked
> +if you use this command on a pack-file that exists within the target
> +repository,  unless you specify --force.
I would want to add:
        If an object already exists unpacked in the repository,
        it will not be replaced with the copy from the pack,
        with or without `--force`.
OK, that's clearer.

> -static const char unpack_usage[] = "git-unpack-objects [-n] [-q] [-r] < pack-file";
> +static const char unpack_usage[] =
> +"git-unpack-objects [-n] [-q] [-r] [--force] [--min-blob-size=N] < pack-file";

Maybe we would want to call it '-f' for consistency.  Another
possibility is the other way around, giving others a longer
synonyms, like --quiet, but this command is plumbing and I do
not think long options matters that much, so my preference is to
do '-f' not '--force'.
I picked the longer one only because I didn't view it as frequently used.
But it will be more used than min-blob-size.  I'll change it to -f.

 * You already have the size here, so if min_blob_size is set
   and the size is larger, you do not even have to call
   write_sha1_file() at all.
The way I read the code,  it looks like unpack-objects needs
the last argument always to be initialized with the SHA-1 computed
from the object contents.  Therefore I always need to call
write_sha1_file(),  even if I don't want it to write anything.

So "ignore" means:
        negative:       never write it out, even if it does not exist.
        zero:           do not write it out if it is available (in pack,
                        or loose, either local or alternate), do
                        write it out otherwise; it is the same
                        as the current behaviour of write_sha1_file().
        positive:       always write it out.
That does not sound like "ignore".
I agree "ignore" is confusing;
I will change it to "when",  which is more consistent with the
_maybe prefix on the function name.

My suggestion would be:
I like this suggestion,  but since I need to call the function
to get the SHA-1,  I don't think I can follow it.

I'll send you an updated patch in a moment.

Thanks,
--
Dana L. How  danahow@xxxxxxxxx  +1 650 804 5991 cell
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux