Re: Unaligned accesses in sha1dc

demerphq <demerphq@xxxxxxxxx> · Fri, 2 Jun 2017 22:17:40 +0200

On 2 June 2017 at 21:32, Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
> On Fri, Jun 2, 2017 at 11:49 AM, Martin Ågren <martin.agren@xxxxxxxxx> wrote:
>> On 2 June 2017 at 10:51, Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
>>> On Fri, Jun 2, 2017 at 2:15 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>>> Martin Ågren <martin.agren@xxxxxxxxx> writes:
>>>>
>>>>> I looked into this some more. It turns out it is possible to trigger
>>>>> undefined behavior on "next". Here's what I did:
>>>>> ...
>>>>>
>>>>> This "fixes" the problem:
>>>>> ...
>>>>> diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
>>>>> index 3dff80a..d6f4c44 100644
>>>>> --- a/sha1dc/sha1.c
>>>>> +++ b/sha1dc/sha1.c
>>>>> @@ -66,9 +66,9 @@
>>>>> ...
>>>>> With this diff, various tests which seem relevant for SHA-1 pass,
>>>>> including t0013, and the UBSan-error is gone. The second diff is just
>>>>> a monkey-patch. I have no reason to believe I will be able to come up
>>>>> with a proper and complete patch for sha1dc. And I guess such a thing
>>>>> would not really be Git's patch to carry, either. But at least Git
>>>>> could consider whether to keep relying on undefined behavior or not.
>>>>>
>>>>> There's a fair chance I've mangled the whitespace. I'm using gmail's
>>>>> web interface... Sorry about that.
>>>>
>>>> Thanks.  I see Marc Stevens is CC'ed in the thread, so I'd expect
>>>> that the final "fix" would come from his sha1collisiondetection
>>>> repository via Ævar.
>>>>
>>>> In the meantime, I am wondering if it makes sense to merge the
>>>> earlier update with #ifdef ALLOW_UNALIGNED_ACCESS and #ifdef
>>>> SHA1DC_FORCE_LITTLEENDIAN for the v2.13.x maintenance track, which
>>>> would at least unblock those on platforms v2.13.0 did not work
>>>> correctly at all.
>>>>
>>>> Ævar, thoughts?
>>>
>>> I think we're mixing up several things here, which need to be untangled:
>>>
>>> 1) The sha1dc works just fine on most platforms even with undefined
>>> behavior, as evidenced by 2.13.0 working.
>>
>> Right, with "platform" meaning "combination of hardware-architecture
>> and compiler". Nothing can be said about how the current code behaves
>> on "x86". Such statements can only be made with regard to "x86 and
>> this or that compiler". Even then, short of studying the compiler
>> implementation/documentation in detail, one cannot be certain that
>> seemingly unrelated changes in Git don't make the code do something
>> else entirely.
>
> I think you're veering into a theoretical discussion here that has
> little to no bearing on the practicalities involved here.
>
> Yes if something is undefined behavior in C the compiler &
> architecture is free to do anything they want with it. In practice
> lots of undefined behavior is de-facto standardized across various
> platforms.
>
> As far as I can tell unaligned access is one of those things. I don't
> think there's ever been an x86 chip / compiler that would run this
> code with any semantic differences when it comes to unaligned access,
> and such a chip / compiler is unlikely to ever exist.
>
> I'm not advocating that we rely on undefined behavior willy-nilly,
> just that we should consider the real situation is (i.e. what actual
> architectures / compilers are doing or are likely to do) as opposed to
> the purely theoretical (if you gave a bunch of aliens who'd never
> heard of our technology the ANSI C standard to implement from
> scratch).
>
> Here's a performance test of your patch above against p3400-rebase.sh.
> I don't know how much these error bars from t/perf can be trusted.
> This is over 30 runs with -O3:
>
> - 3400.2: rebase on top of a lot of unrelated changes
>   v2.12.0     : 1.25(1.10+0.06)
>   v2.13.0     : 1.21(1.06+0.06) -3.2%
>   origin/next : 1.22(1.04+0.07) -2.4%
>   martin        : 1.23(1.06+0.07) -1.6%
> - 3400.4: rebase a lot of unrelated changes without split-index
>   v2.12.0     : 6.49(3.60+0.52)
>   v2.13.0     : 8.21(4.18+0.55) +26.5%
>   origin/next : 8.27(4.34+0.64) +27.4%
>   martin        : 8.80(4.36+0.62) +35.6%
> - 3400.6: rebase a lot of unrelated changes with split-index
>   v2.12.0     : 6.77(3.56+0.51)
>   v2.13.0     : 4.09(2.67+0.38) -39.6%
>   origin/next : 4.13(2.70+0.36) -39.0%
>   martin        : 4.30(2.80+0.32) -36.5%
>
> And just your patch v.s. next:
>
> - 3400.2: rebase on top of a lot of unrelated changes
>   origin/next : 1.22(1.06+0.06)
>   martin      : 1.22(1.06+0.05) +0.0%
> - 3400.4: rebase a lot of unrelated changes without split-index
>   origin/next : 7.54(4.13+0.60)
>   martin      : 7.75(4.34+0.67) +2.8%
> - 3400.6: rebase a lot of unrelated changes with split-index
>   origin/next : 4.19(2.92+0.31)
>   martin      : 4.14(2.84+0.39) -1.2%
>
> It seems to be a bit slower, is that speedup worth the use of
> unaligned access? I genuinely don't know. I'm just interested to find
> what if anything we need to urgently fix in a release version of git.
>
> One data point there is that the fallback blk-sha1 implementation
> we've shipped since 2009 has the same errors about unaligned access as
> before your patch with -fsanitize[..], and looking at the commit
> message this was something Linus knew about at the time, see
> d7c208a92e ("Add new optimized C 'block-sha1' routines", 2009-08-05).
>
> That's strong empirical data suggesting that whatever we want to do
> about unaligned access in the future it's not something which in
> practice would cause wrong sha1 results for the implementation
> shipping with v2.13.0.
>
> As an aside, when I was trying to apply your patch I made a mistake,
> and found that git's test suite could run 100% OK with a bad sha1
> implementation, I didn't apply this part (word diff):
>
> diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
> index 04b104fe58..d6f4c442b5 100644
> --- a/sha1dc/sha1.c
> +++ b/sha1dc/sha1.c
> @@ -312 +312 @@ static void sha1_compression_W(uint32_t ihv[5], const
> uint32_t W[80])
> void sha1_compression_states(uint32_t ihv[5], const
> [-uint32_t-]{+uint8_t+} m[64], uint32_t W[80], uint32_t states[80][5])
> @@ -1642 +1642 @@ static void sha1_recompression_step(uint32_t step,
> uint32_t ihvin[5], uint32_t i
> static void sha1_process(SHA1_CTX* ctx, const [-uint32_t-]{+uint8_t+} block[64])
> diff --git a/sha1dc/sha1.h b/sha1dc/sha1.h
> index 1d1d2b8d7c..1d181a1403 100644
> --- a/sha1dc/sha1.h
> +++ b/sha1dc/sha1.h
> @@ -21 +21 @@ extern "C" {
> void sha1_compression_states(uint32_t[5], const
> [-uint32_t-]{+uint8_t+}[64], uint32_t[80], uint32_t[80][5]);

That change doesnt look right anyway.

The original code asserts that it will receive a pointer to 64
uint32_t's as the second argument.

The changed code asserts that it will receive a pointer to 64
uint8_t's as the second argument.

If you change the type you will need to change the *number* correspondingly.

I would expect to see the uint32_t[64] to turn into uint8_t[256]

I have noticed that gcc does not necessarily enforce these types of
declarations and I believe that the change might not have any affect
at all depending on how the pointers are being used. (C passes
pointers on the stack, so afaik these things are more hints than
anything else.)

Looking at the code it looks like it conflates endianness with
alignedness when they arent the same thing, except that the majority
of little-endian boxes are x86 which do not require aligned access,
and many big-endian boxes do require aligned access. IOW, even they
are often related they are distinct properties.

Most hash function implementations have code like the following
(extracted and reduced from hv_macro.h in perl.git [which only
supports little-endian hash functions]):

#ifndef U32_ALIGNMENT_REQUIRED
  #if (BYTEORDER == 0x1234 || BYTEORDER == 0x12345678)
    #define U8TO32_LE(ptr)   (*((const U32*)(ptr)))
#elif (BYTEORDER == 0x4321 || BYTEORDER == 0x87654321)
    #if defined(__GNUC__) && (__GNUC__>4 || (__GNUC__==4 && __GNUC_MINOR__>=3))
      #define U8TO32_LE(ptr)   (__builtin_bswap32(*((U32*)(ptr))))
    #endif
  #endif
#endif

#ifndef U8TO16_LE
    /* Without a known fast bswap32 we're just as well off doing this */
  #define U8TO32_LE(ptr)
((U32)(ptr)[0]|(U32)(ptr)[1]<<8|(U32)(ptr)[2]<<16|(U32)(ptr)[3]<<24)
#endif

Of course, in the case of SHA1 it is defined as being big-endian
(making it a relatively poor choice for use on x86), so you would need
to reverse a bit of this logic.

Note the form shown in that last block will work regardless of
endianness or alignedness requirements and should be optimised
properly by most compilers into the most efficient operation.

In other words, if you guys just switch to that kind of pattern this
problem will go away everywhere, and the only issue you will have to
consider is if it makes some oddball platforms too slow.

Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"