Re: [PATCH] n_tty: Add memory barrier to fix race condition in receive path

Christian Riesch <christian.riesch@xxxxxxxxxx> · Fri, 7 Nov 2014 07:50:46 +0100

[sent again due to stupid HTML mail problems, sorry]

On Thu, Nov 6, 2014 at 11:54 PM, Måns Rullgård <mans@xxxxxxxxx> wrote:
> Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> writes:
>
>> On Thu, Nov 06, 2014 at 10:12:54PM +0000, Måns Rullgård wrote:
>>> Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> writes:
>>>
>>> > On Thu, Nov 06, 2014 at 09:38:59PM +0000, Måns Rullgård wrote:
>>> >> Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> writes:
>>> >>
>>> >> > On Thu, Nov 06, 2014 at 09:01:36PM +0000, Måns Rullgård wrote:
>>> >> >> Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> writes:
>>> >> >>
>>> >> >> > On Thu, Nov 06, 2014 at 08:49:01PM +0000, Måns Rullgård wrote:
>>> >> >> >> Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> writes:
>>> >> >> >>
>>> >> >> >> > On Thu, Nov 06, 2014 at 12:39:59PM +0100, Christian Riesch wrote:
>>> >> >> >> >> The current implementation of put_tty_queue() causes a race condition
>>> >> >> >> >> when re-arranged by the compiler.
>>> >> >> >> >>
>>> >> >> >> >> On my build with gcc 4.8.3, cross-compiling for ARM, the line
>>> >> >> >> >>
>>> >> >> >> >>  *read_buf_addr(ldata, ldata->read_head++) = c;
>>> >> >> >> >>
>>> >> >> >> >> was re-arranged by the compiler to something like
>>> >> >> >> >>
>>> >> >> >> >>  x = ldata->read_head
>>> >> >> >> >>  ldata->read_head++
>>> >> >> >> >>  *read_buf_addr(ldata, x) = c;
>>> >> >> >> >>
>>> >> >> >> >> which causes a race condition. Invalid data is read if data is read
>>> >> >> >> >> before it is actually written to the read buffer.
>>> >> >> >> >
>>> >> >> >> > Really?  A compiler can rearange things like that and expect things to
>>> >> >> >> > actually work?  How is that valid?
>>> >> >> >>
>>> >> >> >> This is actually required by the C spec.  There is a sequence point
>>> >> >> >> before a function call, after the arguments have been evaluated.  Thus
>>> >> >> >> all side-effects, such as the post-increment, must be complete before
>>> >> >> >> the function is called, just like in the example.
>>> >> >> >>
>>> >> >> >> There is no "re-arranging" here.  The code is simply wrong.
>>> >> >> >
>>> >> >> > Ah, ok, time to dig out the C spec...
>>> >> >> >
>>> >> >> > Anyway, because of this, no need for the wmb() calls, just rearrange the
>>> >> >> > logic and all should be good, right?  Christian, can you test that
>>> >> >> > instead?
>>> >> >>
>>> >> >> Weakly ordered SMP systems probably need some kind of barrier.  I didn't
>>> >> >> look at it carefully.
>>> >> >
>>> >> > It shouldn't need a barier, as it is a sequence point with the function
>>> >> > call.  Well, it's an inline function, but that "shouldn't" matter here,
>>> >> > right?
>>> >>
>>> >> Sequence points say nothing about the order in which stores become
>>> >> visible to other CPUs.  That's why there are barrier instructions.
>>> >
>>> > Yes, but "order" matters.
>>> >
>>> > If I write code that does:
>>> >
>>> > 100        x = ldata->read_head;
>>> > 101        &ldata->read_head[x & SOME_VALUE] = y;
>>> > 102        ldata->read_head++;
>>> >
>>> > the compiler can not reorder lines 102 and 101 just because it feels
>>> > like it, right?  Or is it time to go spend some reading of the C spec
>>> > again...
>>>
>>> The compiler can't.  The hardware can.  All the hardware promises is
>>> that at some unspecified time in the future, both memory locations will
>>> have the correct values.  Another CPU might see 'read_head' updated
>>> before it sees the corresponding data value.  A wmb() between the writes
>>> forces the CPU to complete preceding stores before it begins subsequent
>>> ones.
>>
>> Yes, sorry, I'm not talking about other CPUs and what they see, I'm
>> talking about the local one.  I'm not assuming that this is SMP "safe"
>> at all.  If it is supposed to be, then yes, we do have problems, but
>> there should be a lock _somewhere_ protecting this.
>
> Within the confines of a single CPU + memory, barriers are never needed.
> The moment another CPU or master-capable peripheral enters the mix,
> proper ordering must be enforced somehow.
>
> If the buffer is already protected by a lock of some kind, this will
> provide the necessary barriers, so nothing further is necessary.  If
> it's a lock-less design, there will need to be barriers somewhere.

It was changed to lock-less with 3.12 in commit
6d76bd2618535c581f1673047b8341fd291abc67 ("n_tty: Make N_TTY ldisc
receive
path lockless"). So I will try to read the memory barrier docs again.

Of course my little ARM system is no SMP system, but I guess this
should also be fixed for the SMP case, right?

Thanks,
Christian
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html