Re: [RFC PATCH] LKMM: Add ctrl_dep() macro for control dependency

Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> · Wed, 29 Sep 2021 15:50:52 -0400 (EDT)

----- On Sep 29, 2021, at 10:54 AM, Linus Torvalds torvalds@xxxxxxxxxxxxxxxxxxxx wrote:

> On Wed, Sep 29, 2021 at 7:47 AM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> And if there *is* need ("look, we have that same store in both the if-
>> and the else-statement" or whatever), then say so, and state that
>> thing.
> 
> Side note: I'd also like the commit that introduces this to talk very
> explicitly about the particular case that it is used doe and that it
> fixes. No "this can happen". A "this happened, here's the _actual_
> wrong code generation, and look how this new ctrl_dep() macro fixes
> it".
> 
> When it's this subtle, I don't want theoretical arguments. I want
> actual outstanding and real bugs.
> 
> Because I get the feeling that there were very few actual realistic
> examples of this, only made-up theoretical cases that wouldn't ever
> really be found in real code.

There is one particular scenario which concerns me in refcount_dec_and_test().
It relies on smp_acquire__after_ctrl_dep() to promote the control
dependency to an acquire barrier on success. Because it is exposed
within a static inline function, it hides the fact that control dependencies
are being used under the hood.

I have not identified a specific instance of oddly generated code, but this is
an accident waiting to happen. If we take this simplified refcount code
into godbolt.org and compile it for RISC-V rv64gc clang 12.0.1:

#define RISCV_FENCE(p, s) \
        __asm__ __volatile__ ("fence " #p "," #s : : : "memory")
#define __smp_rmb()     RISCV_FENCE(r,r)

volatile int var1;
volatile int refcount;

static inline bool refcount_dec_and_test(void)
{
    refcount--;
    if (refcount == 0) {
        __smp_rmb();    /* acquire after ctrl_dep */
        return true;
    }
    return false;
}

void fct(void)
{
    int x;

    if (refcount_dec_and_test()) {
        var1 = 0;
        return;
    }
    __smp_rmb();
    var1 = 1;
}

We end up with:

fct():                               # @fct()
        lui     a0, %hi(refcount)
        lw      a1, %lo(refcount)(a0)
        addiw   a1, a1, -1
        sw      a1, %lo(refcount)(a0)
        lw      a0, %lo(refcount)(a0)
        fence   r, r
        snez    a0, a0
        lui     a1, %hi(var1)
        sw      a0, %lo(var1)(a1)
        ret

Which lacks the conditional branch, and where the "fence r,r" instruction
does not properly order following stores after the refcount load.

Adding ctrl_dep() around the refcount == 0 check fixes this:

fct():                                # @fct()
        lui     a0, %hi(refcount)
        lw      a1, %lo(refcount)(a0)
        addiw   a1, a1, -1
        sw      a1, %lo(refcount)(a0)
        lw      a0, %lo(refcount)(a0)
        beqz    a0, .LBB0_2

        fence   r, r
        addi    a0, zero, 1
        j       .LBB0_3
.LBB0_2:

        fence   r, r
        mv      a0, zero
.LBB0_3:
        lui     a1, %hi(var1)
        sw      a0, %lo(var1)(a1)
        ret

I admit that this is still a "made up" example, although it is similar to the actual
implementation of refcount_dec_and_check(). But if we need to audit every user of this
API for wrongly generated code, we may be looking for a needle in a haystack.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com