On Sat, Jun 28, 2014 at 1:56 AM, Thomas Braun <thomas.braun@xxxxxxxxxxxxxxxxxxx> wrote: >> The name is misleading and forced me to read it twice before I >> realized that this is "populating the is-binary bit". It might make >> it a bit better if you renamed it to DIFF_POPULATE_IS_BINARY_BIT or >> perhaps DIFF_POPULATE_CHECK_BINARY or something. For consistency, >> the other bit may want to be also renamed from SIZE_ONLY to either >> >> (1) CHECK_SIZE_ONLY >> >> (2) One bit for CHECK_SIZE, another for NO_CONTENTS, and optionally >> make SIZE_ONLY the union of two >> >> I do not have strong preference either way; the latter may be more >> logical in that "not loading contents" and "check size" are sort of >> orthogonal in that you can later choose to check, without loading >> contents, only the binary-ness without checking size, but no calles >> that passes a non-zero flag to the populate-filespec function will >> want to slurp in the contents in practice, so in that sense we could >> declare that the NO_CONENTS bit is implied. Will do (and probably go with (1) as I still prefer zero as "good defaults") >> But more importantly, would this patch actually help? Well yes as demonstrated by the new test ;-) Unfortunately the scope of help is limited to --stat.. I should have done more thorough testing. >> For one >> thing, this wouldn't (and shouldn't) help if the user wants --binary >> diff out of us anyway, I suspect, but I wonder what the following >> codepath in the builtin_diff() function would do: >> >> ... >> } else if (!DIFF_OPT_TST(o, TEXT) && >> ( (!textconv_one && diff_filespec_is_binary(one)) || >> (!textconv_two && diff_filespec_is_binary(two)) )) { >> if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0) >> die("unable to read files to diff"); >> /* Quite common confusing case */ >> if (mf1.size == mf2.size && >> !memcmp(mf1.ptr, mf2.ptr, mf1.size)) { >> if (must_show_header) >> fprintf(o->file, "%s", header.buf); >> goto free_ab_and_return; >> } >> fprintf(o->file, "%s", header.buf); >> strbuf_reset(&header); >> if (DIFF_OPT_TST(o, BINARY)) >> emit_binary_diff(o->file, &mf1, &mf2, line_prefix); >> else >> fprintf(o->file, "%sBinary files %s and %s differ\n", >> line_prefix, lbl[0], lbl[1]); >> o->found_changes = 1; >> } else { >> ... >> >> If we weren't told with --text/-a to force textual output, and >> at least one of the sides is marked as binary (and this patch marks >> a large blob as binary unless attributes says otherwise), we still >> call fill_mmfile() on them to slurp the contents to be compared, no? >> >> And before you get to the DIFF_OPT_TST(o, BINARY), you memcmp(3) to >> check if the sides are identical, so... > > Good point. So how about an additional change roughly sketched as > > @@ -2223,6 +2223,14 @@ struct userdiff_driver *get_textconv(struct > diff_filespec *one) > return userdiff_get_textconv(one->driver); > } > > +/* read the files in small chunks into memory and compare them */ > +static int filecmp_chunked(struct diff_filespec *one, > + struct diff_filespec *two) > +{ > + // TODO add implementation > + return 0; > +} > + We have object streaming interface to do similar like this. In fact index-pack already does large file memcmp() for hash collision test. I think I can move some code around and support large file in this code path.. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html