Re: [PATCH v2 4/4] diff: mark any file larger than core.bigfilethreshold binary

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jun 28, 2014 at 1:56 AM, Thomas Braun
<thomas.braun@xxxxxxxxxxxxxxxxxxx> wrote:
>> The name is misleading and forced me to read it twice before I
>> realized that this is "populating the is-binary bit".  It might make
>> it a bit better if you renamed it to DIFF_POPULATE_IS_BINARY_BIT or
>> perhaps DIFF_POPULATE_CHECK_BINARY or something.  For consistency,
>> the other bit may want to be also renamed from SIZE_ONLY to either
>>
>>  (1) CHECK_SIZE_ONLY
>>
>>  (2) One bit for CHECK_SIZE, another for NO_CONTENTS, and optionally
>>      make SIZE_ONLY the union of two
>>
>> I do not have strong preference either way; the latter may be more
>> logical in that "not loading contents" and "check size" are sort of
>> orthogonal in that you can later choose to check, without loading
>> contents, only the binary-ness without checking size, but no calles
>> that passes a non-zero flag to the populate-filespec function will
>> want to slurp in the contents in practice, so in that sense we could
>> declare that the NO_CONENTS bit is implied.

Will do (and probably go with (1) as I still prefer zero as "good defaults")

>> But more importantly, would this patch actually help?

Well yes as demonstrated by the new test ;-) Unfortunately the scope
of help is limited to --stat.. I should have done more thorough
testing.

>> For one
>> thing, this wouldn't (and shouldn't) help if the user wants --binary
>> diff out of us anyway, I suspect, but I wonder what the following
>> codepath in the builtin_diff() function would do:
>>
>>               ...
>>       } else if (!DIFF_OPT_TST(o, TEXT) &&
>>           ( (!textconv_one && diff_filespec_is_binary(one)) ||
>>             (!textconv_two && diff_filespec_is_binary(two)) )) {
>>               if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
>>                       die("unable to read files to diff");
>>               /* Quite common confusing case */
>>               if (mf1.size == mf2.size &&
>>                   !memcmp(mf1.ptr, mf2.ptr, mf1.size)) {
>>                       if (must_show_header)
>>                               fprintf(o->file, "%s", header.buf);
>>                       goto free_ab_and_return;
>>               }
>>               fprintf(o->file, "%s", header.buf);
>>               strbuf_reset(&header);
>>               if (DIFF_OPT_TST(o, BINARY))
>>                       emit_binary_diff(o->file, &mf1, &mf2, line_prefix);
>>               else
>>                       fprintf(o->file, "%sBinary files %s and %s differ\n",
>>                               line_prefix, lbl[0], lbl[1]);
>>               o->found_changes = 1;
>>       } else {
>>               ...
>>
>> If we weren't told with --text/-a to force textual output, and
>> at least one of the sides is marked as binary (and this patch marks
>> a large blob as binary unless attributes says otherwise), we still
>> call fill_mmfile() on them to slurp the contents to be compared, no?
>>
>> And before you get to the DIFF_OPT_TST(o, BINARY), you memcmp(3) to
>> check if the sides are identical, so...
>
> Good point. So how about an additional change roughly sketched as
>
> @@ -2223,6 +2223,14 @@ struct userdiff_driver *get_textconv(struct
> diff_filespec *one)
>         return userdiff_get_textconv(one->driver);
>  }
>
> +/* read the files in small chunks into memory and compare them */
> +static int filecmp_chunked(struct diff_filespec *one,
> +       struct diff_filespec *two)
> +{
> +       // TODO add implementation
> +       return 0;
> +}
> +


We have object streaming interface to do similar like this. In fact
index-pack already does large file memcmp() for hash collision test. I
think I can move some code around and support large file in this code
path..
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]