Re: Unaligned accesses in sha1dc

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Fri, 2 Jun 2017 22:38:39 +0200

On Fri, Jun 2, 2017 at 10:17 PM, demerphq <demerphq@xxxxxxxxx> wrote:
> On 2 June 2017 at 21:32, Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
>> On Fri, Jun 2, 2017 at 11:49 AM, Martin Ågren <martin.agren@xxxxxxxxx> wrote:
>>> On 2 June 2017 at 10:51, Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
>>>> On Fri, Jun 2, 2017 at 2:15 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>>>> Martin Ågren <martin.agren@xxxxxxxxx> writes:
>>>>>
>>>>>> I looked into this some more. It turns out it is possible to trigger
>>>>>> undefined behavior on "next". Here's what I did:
>>>>>> ...
>>>>>>
>>>>>> This "fixes" the problem:
>>>>>> ...
>>>>>> diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
>>>>>> index 3dff80a..d6f4c44 100644
>>>>>> --- a/sha1dc/sha1.c
>>>>>> +++ b/sha1dc/sha1.c
>>>>>> @@ -66,9 +66,9 @@
>>>>>> ...
>>>>>> With this diff, various tests which seem relevant for SHA-1 pass,
>>>>>> including t0013, and the UBSan-error is gone. The second diff is just
>>>>>> a monkey-patch. I have no reason to believe I will be able to come up
>>>>>> with a proper and complete patch for sha1dc. And I guess such a thing
>>>>>> would not really be Git's patch to carry, either. But at least Git
>>>>>> could consider whether to keep relying on undefined behavior or not.
>>>>>>
>>>>>> There's a fair chance I've mangled the whitespace. I'm using gmail's
>>>>>> web interface... Sorry about that.
>>>>>
>>>>> Thanks.  I see Marc Stevens is CC'ed in the thread, so I'd expect
>>>>> that the final "fix" would come from his sha1collisiondetection
>>>>> repository via Ævar.
>>>>>
>>>>> In the meantime, I am wondering if it makes sense to merge the
>>>>> earlier update with #ifdef ALLOW_UNALIGNED_ACCESS and #ifdef
>>>>> SHA1DC_FORCE_LITTLEENDIAN for the v2.13.x maintenance track, which
>>>>> would at least unblock those on platforms v2.13.0 did not work
>>>>> correctly at all.
>>>>>
>>>>> Ævar, thoughts?
>>>>
>>>> I think we're mixing up several things here, which need to be untangled:
>>>>
>>>> 1) The sha1dc works just fine on most platforms even with undefined
>>>> behavior, as evidenced by 2.13.0 working.
>>>
>>> Right, with "platform" meaning "combination of hardware-architecture
>>> and compiler". Nothing can be said about how the current code behaves
>>> on "x86". Such statements can only be made with regard to "x86 and
>>> this or that compiler". Even then, short of studying the compiler
>>> implementation/documentation in detail, one cannot be certain that
>>> seemingly unrelated changes in Git don't make the code do something
>>> else entirely.
>>
>> I think you're veering into a theoretical discussion here that has
>> little to no bearing on the practicalities involved here.
>>
>> Yes if something is undefined behavior in C the compiler &
>> architecture is free to do anything they want with it. In practice
>> lots of undefined behavior is de-facto standardized across various
>> platforms.
>>
>> As far as I can tell unaligned access is one of those things. I don't
>> think there's ever been an x86 chip / compiler that would run this
>> code with any semantic differences when it comes to unaligned access,
>> and such a chip / compiler is unlikely to ever exist.
>>
>> I'm not advocating that we rely on undefined behavior willy-nilly,
>> just that we should consider the real situation is (i.e. what actual
>> architectures / compilers are doing or are likely to do) as opposed to
>> the purely theoretical (if you gave a bunch of aliens who'd never
>> heard of our technology the ANSI C standard to implement from
>> scratch).
>>
>> Here's a performance test of your patch above against p3400-rebase.sh.
>> I don't know how much these error bars from t/perf can be trusted.
>> This is over 30 runs with -O3:
>>
>> - 3400.2: rebase on top of a lot of unrelated changes
>>   v2.12.0     : 1.25(1.10+0.06)
>>   v2.13.0     : 1.21(1.06+0.06) -3.2%
>>   origin/next : 1.22(1.04+0.07) -2.4%
>>   martin        : 1.23(1.06+0.07) -1.6%
>> - 3400.4: rebase a lot of unrelated changes without split-index
>>   v2.12.0     : 6.49(3.60+0.52)
>>   v2.13.0     : 8.21(4.18+0.55) +26.5%
>>   origin/next : 8.27(4.34+0.64) +27.4%
>>   martin        : 8.80(4.36+0.62) +35.6%
>> - 3400.6: rebase a lot of unrelated changes with split-index
>>   v2.12.0     : 6.77(3.56+0.51)
>>   v2.13.0     : 4.09(2.67+0.38) -39.6%
>>   origin/next : 4.13(2.70+0.36) -39.0%
>>   martin        : 4.30(2.80+0.32) -36.5%
>>
>> And just your patch v.s. next:
>>
>> - 3400.2: rebase on top of a lot of unrelated changes
>>   origin/next : 1.22(1.06+0.06)
>>   martin      : 1.22(1.06+0.05) +0.0%
>> - 3400.4: rebase a lot of unrelated changes without split-index
>>   origin/next : 7.54(4.13+0.60)
>>   martin      : 7.75(4.34+0.67) +2.8%
>> - 3400.6: rebase a lot of unrelated changes with split-index
>>   origin/next : 4.19(2.92+0.31)
>>   martin      : 4.14(2.84+0.39) -1.2%
>>
>> It seems to be a bit slower, is that speedup worth the use of
>> unaligned access? I genuinely don't know. I'm just interested to find
>> what if anything we need to urgently fix in a release version of git.
>>
>> One data point there is that the fallback blk-sha1 implementation
>> we've shipped since 2009 has the same errors about unaligned access as
>> before your patch with -fsanitize[..], and looking at the commit
>> message this was something Linus knew about at the time, see
>> d7c208a92e ("Add new optimized C 'block-sha1' routines", 2009-08-05).
>>
>> That's strong empirical data suggesting that whatever we want to do
>> about unaligned access in the future it's not something which in
>> practice would cause wrong sha1 results for the implementation
>> shipping with v2.13.0.
>>
>> As an aside, when I was trying to apply your patch I made a mistake,
>> and found that git's test suite could run 100% OK with a bad sha1
>> implementation, I didn't apply this part (word diff):
>>
>> diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
>> index 04b104fe58..d6f4c442b5 100644
>> --- a/sha1dc/sha1.c
>> +++ b/sha1dc/sha1.c
>> @@ -312 +312 @@ static void sha1_compression_W(uint32_t ihv[5], const
>> uint32_t W[80])
>> void sha1_compression_states(uint32_t ihv[5], const
>> [-uint32_t-]{+uint8_t+} m[64], uint32_t W[80], uint32_t states[80][5])
>> @@ -1642 +1642 @@ static void sha1_recompression_step(uint32_t step,
>> uint32_t ihvin[5], uint32_t i
>> static void sha1_process(SHA1_CTX* ctx, const [-uint32_t-]{+uint8_t+} block[64])
>> diff --git a/sha1dc/sha1.h b/sha1dc/sha1.h
>> index 1d1d2b8d7c..1d181a1403 100644
>> --- a/sha1dc/sha1.h
>> +++ b/sha1dc/sha1.h
>> @@ -21 +21 @@ extern "C" {
>> void sha1_compression_states(uint32_t[5], const
>> [-uint32_t-]{+uint8_t+}[64], uint32_t[80], uint32_t[80][5]);
>
>
> That change doesnt look right anyway.
>
> The original code asserts that it will receive a pointer to 64
> uint32_t's as the second argument.
>
> The changed code asserts that it will receive a pointer to 64
> uint8_t's as the second argument.
>
> If you change the type you will need to change the *number* correspondingly.
>
> I would expect to see the uint32_t[64] to turn into uint8_t[256]

Indeed, the full change is one where "uint32_t m[16]" is changed to
"uint8_t m[64]". See Martin's patch upthread.

The word-diff I posted is in the context of my misapplying that patch
(since GMail destroyed it I manually typed in the change). That
produced "uint32_t m[16]" -> "uint32_t m[64]" instead of Martin's
version.

That's obviously incorrect and results in a broken SHA-1
implementation, but git's test suite isn't exhaustive enough that
nothing in t/ caught it, only the t/perf rebase performance test.

> I have noticed that gcc does not necessarily enforce these types of
> declarations and I believe that the change might not have any affect
> at all depending on how the pointers are being used. (C passes
> pointers on the stack, so afaik these things are more hints than
> anything else.)
>
> Looking at the code it looks like it conflates endianness with
> alignedness when they arent the same thing, except that the majority
> of little-endian boxes are x86 which do not require aligned access,
> and many big-endian boxes do require aligned access. IOW, even they
> are often related they are distinct properties.
>
> Most hash function implementations have code like the following
> (extracted and reduced from hv_macro.h in perl.git [which only
> supports little-endian hash functions]):
>
> #ifndef U32_ALIGNMENT_REQUIRED
>   #if (BYTEORDER == 0x1234 || BYTEORDER == 0x12345678)
>     #define U8TO32_LE(ptr)   (*((const U32*)(ptr)))
> #elif (BYTEORDER == 0x4321 || BYTEORDER == 0x87654321)
>     #if defined(__GNUC__) && (__GNUC__>4 || (__GNUC__==4 && __GNUC_MINOR__>=3))
>       #define U8TO32_LE(ptr)   (__builtin_bswap32(*((U32*)(ptr))))
>     #endif
>   #endif
> #endif
>
> #ifndef U8TO16_LE
>     /* Without a known fast bswap32 we're just as well off doing this */
>   #define U8TO32_LE(ptr)
> ((U32)(ptr)[0]|(U32)(ptr)[1]<<8|(U32)(ptr)[2]<<16|(U32)(ptr)[3]<<24)
> #endif
>
> Of course, in the case of SHA1 it is defined as being big-endian
> (making it a relatively poor choice for use on x86), so you would need
> to reverse a bit of this logic.
>
> Note the form shown in that last block will work regardless of
> endianness or alignedness requirements and should be optimised
> properly by most compilers into the most efficient operation.
>
> In other words, if you guys just switch to that kind of pattern this
> problem will go away everywhere, and the only issue you will have to
> consider is if it makes some oddball platforms too slow.

I think this is something Marc Stevens & co would be very interested
to hear about at
https://github.com/cr-marcstevens/sha1collisiondetection

He's CC'd here but I suspect he's not keeping up with this deluge of
E-Mails from the git project very closely, we're just using his
software as-is.