Re: [PATCH RFC v2 02/25] stackdepot: prevent Clang from optimizing away stackdepot_memcmp()

Alexander Potapenko <glider@xxxxxxxxxx> · Thu, 7 Nov 2019 10:22:32 +0100



On Thu, Nov 7, 2019 at 10:04 AM Arnd Bergmann <arnd@xxxxxxxx> wrote:
>
> On Thu, Nov 7, 2019 at 7:08 AM Sergey Senozhatsky
> <sergey.senozhatsky.work@xxxxxxxxx> wrote:
> >
> > On (19/11/06 12:43), Alexander Potapenko wrote:
> > > > On (19/10/30 15:22), glider@xxxxxxxxxx wrote:
> > > > > @@ -163,6 +163,11 @@ int stackdepot_memcmp(const unsigned long *u1, const unsigned long *u2,
> > > > >                       unsigned int n)
> > > > >  {
> > > > >       for ( ; n-- ; u1++, u2++) {
> > > > > +             /*
> > > > > +              * Prevent Clang from replacing this function with a bcmp()
> > > > > +              * call.
> > > > > +              */
> > > > > +             barrier();
> > > > >               if (*u1 != *u2)
> > > > >                       return 1;
> > > > >       }
> > > >
> > > > Would 'volatile' do the trick?
> > > It does. I can replace the barrier with a volatile if you think that's better.
> > > However this'll add a checkpatch warning, as volatiles are discouraged
> > > for synchronization (although in this case there's no synchronization)
> >
> > Yeah, 'volatile' in this case will do what it sort of meant to do - prevent
> > compiler optimizations. So, like you said, it's not a synchronization issue
> > and we don't 'volatile' data structures.
>
> The normal way to do a volatile access would be
> READ_ONCE()/WRITE_ONCE(), but that seems stronger than
> the barrier() here. I'd just stick to adding a barrier.
I actually like the READ_ONCE idea more, as READ_ONCE is really a
documented way to prevent the compiler from merging reads, which is
what we want here.

I also thought that the original barrier() statement was just a
compiler barrier, which didn't introduce any additional CPU
instructions.
Turns out I was wrong, and barrier() also serves as a memory barrier.
This doesn't really matter in this case, because this memcmp function
isn't performance-critical (we are preventing Clang from optimizing
it, after all).
Still, READ_ONCE and WRITE_ONCE might be even cheaper, as they are
just relaxed memory accesses implemented using volatile.
> > Do you need to do barrier() on every iteration? Does clang behave if
> > you do one barrier() instead of 'n' barrier()-s?
>
> If it does things right, it would make that a single-byte copy plus a call
> to bcmp(). I certainly wouldn't want to have an implementation that relies
> on the compiler making sub-optimal decisions ;-)
That's a really good point :)
>       Arnd


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg