Re: [PATCH 1/2] object-file: fix SEGV on free() regression in v2.34.0-rc2

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Sat, 13 Nov 2021 10:00:37 +0100

On Thu, Nov 11 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@xxxxxxxxx> writes:
>
>> Fix a regression introduced in my 96e41f58fe1 (fsck: report invalid
>> object type-path combinations, 2021-10-01). When fsck-ing blobs larger
>> than core.bigFileThreshold we'd free() a pointer to uninitialized
>> memory.
>
> s/d we'd/d, we'd/; no need to resend.
>
>> This issue would have been caught by SANITIZE=address, but since it
>> involves core.bigFileThreshold none of the existing tests in our test
>> suite covered it.
>
> s/d none/d, none/; likewise.
>
>> Running them with the "big_file_threshold" in "environment.c" changed
>> to say "6" would have shown this failure, but let's add a dedicated
>> test for this scenario based on Han Xin's report[1].
>
> Yeah, it is a good and focused test.  
>
> By the way, I do not think changing big_file_threshold _blindly_ to
> smaller values, instead of in a focused test like this, is a good
> idea in general.  Some tests check if a file with a normal size that
> is smaller than the threshold correctly is treated as a binary file,
> and lowering threshold for them without understanding what they are
> meant to test would trigger a "bug" that is not a bug at all, for
> example.
>
>> It would be a good follow-up change to add a GIT_TEST_* mode to run
>> all the tests with a low core.bigFileThreshold threshold.
>
> So, no, please don't do that.

Yes it's probably not worth it, and I've got enough dragons to slay as
it is.

I took the commentary you added in 12426e114b2 (diff: do not short-cut
CHECK_SIZE_ONLY check in diff_populate_filespec(), 2017-03-01) as a
suggestion that we might be conflating too many things in
core.bigFileThreshold, but maybe that's just projecting.

I think that setting is probably too much of a kitchen sink grab bag of
stuff for its own good. Any such GIT_TEST_* mode would I think need to
introduce another setting to not have it imply "these files are binary".

Which may be a good idea in general, and it might not. I.e. are there
users who mainly don't want to consider these for packing, but do want
to have "git diff" work on them?

Anyway, even if that were split up we'd still have the remaining tests
that are assuming that something ends up loose or in a pack.

Fixing those is probably a good idea either way, so poking at this might
be a useful canary for someone. I haven't looked in any detail, but a
part of them are probably checking things manually on .git/objects and
could move to "git rev-parse" or whatever.

The other half likely really do care about whether something ends up
loose or not, and would probably benefit from testing "both sides".

None of that's anything I'll pursue now, just idle thoughts from having
looked at these failures a bit, in case anyone's interested.