Re: Corrupt name-rev output

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 21.04.22 um 04:11 schrieb Elijah Newren:
> On Tue, Apr 19, 2022 at 8:13 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>
>> Thomas Hurst <tom@xxxxxx> writes:
>>
>>> I've noticed a series of about 20 commits in the HardenedBSD repository
>>> fairly reliably produce garbage names from git name-rev - usually
>>> fragments of another commit, sometimes unprintable nonsense.  Sometimes
>>> it works just fine...
>>>
>>> Here's a quick demo showing how to reproduce the problem:
>>>
>>> % uname -mrs
>>> FreeBSD 13.0-RELEASE-p11 amd64
>>> % git --version
>>> git version 2.35.2
>>> % git clone --bare --mirror https://github.com/HardenedBSD/hardenedBSD.git
>>> % cd hardenedBSD.git
>>> % git rev-list --branches=\* |
>>>   git name-rev --stdin --refs=heads/\* |
>>>   egrep -v '^[0-9a-f]{40}( \([a-zA-Z0-9_/.^~-]+\))?$'
>>> 3eb67b534cab6a78b44b13e4323fd60353003089 (y:    marcel
>>> MFC after:      3 days
>>> Relnotes:       yes
>>> Sponsored by:   ScaleEngine Inc.
>>> Differential Revision:  https://reviews.freebsd.org/D3065
>>> ~3)
>>> 3ac660fc0c6eb0f876972e7e415c89f1ebed1939 (y:    marcel
>>> ...
>>> Passing these commits into name-rev as arguments finds them under
>>> hardened/current/relro~199^2
>>>
>>> git fsck --full does not reveal or fix anything, and the problem also
>>> persists with a build from source from the next branch.
>>>
>>> I was unable to reproduce on an Ubuntu machine with 2.32.0, so I used
>>> that as a starting point for bisection and landed here:
>>>
>>>   3656f842789d25d75da41c6c029470052a573b54
>>>   name-rev: prefer shorter names over following merges
>>
>> commit 3656f842789d25d75da41c6c029470052a573b54
>> Author: Elijah Newren <newren@xxxxxxxxx>
>>
>> Hmph, Elijah, does this ring a bell?
>
> After digging around last night and tonight, this appears to be a poor
> interaction with commit 2d53975488 ("name-rev: release unused name
> strings", 2020-02-04), which frees shared strings and relies on all
> other users of that shared string to update their name, which
> apparently seemed to rely on some intricacies of how the algorithm
> worked that are no longer valid with my change, resulting in some
> use-after-frees (though for some reason valgrind isn't spotting them
> for me, which made it harder to track these down).

So we need better test coverage.

The strings are freed for generation 0 because their new name was always
better for higher generations as well and thus would be replaced and not
looked at again.  I think that assumption doesn't hold anymore because
3656f84278 (name-rev: prefer shorter names over following merges,
2021-12-04) raised the bar for names to be accepted for generation > 0.

> Reverting 2d53975488 fixes the problem.

That's a good band-aid.  Reverting 3656f84278 would help as well, but
change the output.

> Maybe this means we need to have tip_name be a string + a refcount, so
> that we can know when we can safely free it?

Sure, at the cost of storing and maintaining the refcounts.

IIRC I also explored avoiding to append suffixes to candidate name
strings in get_parent_name() and instead storing a reference to the
parent.  Many tag names are shorter than the eight bytes of a pointer
and certainly shorter than an object ID, so while that would reduce the
number of strings to deal with (and thus the need to free them), it
probably would increases the overall memory usage.

>  Adding Rene to the cc
> for comments.  Rene: If it helps, there's a slightly simpler
> reproduction: clone the repo Thomas mentions, and then instead of his
> "rev-list | name-rev | grep" sequence just run:
>
>     git name-rev --refs=heads/\* 3eb67b534cab6a78b44b13e4323fd60353003089




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux