Re: [PATCH V2] MIPS: implement smp_cond_load_acquire() for Loongson-3

"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> · Tue, 10 Jul 2018 09:14:23 -0700

On Tue, Jul 10, 2018 at 02:17:27PM +0200, Peter Zijlstra wrote:
> 
> Please!! Learn to use email.
> 
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing in e-mail?
> 
> Also, wrap non-quoted lines to 78 characters.
> 
> On Tue, Jul 10, 2018 at 07:45:22PM +0800, 陈华才 wrote:
> > I'm afraid that you have missing something......
> > 
> > Firstly, our previous conclusion (READ_ONCE need a barrier to avoid
> > 'reads prioritised over writes') is totally wrong. So define
> > cpu_relax() to smp_mb() like ARM11MPCore is incorrect, even if it can
> > 'solve' Loongson's problem.  Secondly, I think the real problem is
> > like this:
> 
> >  1, CPU0 set the lock to 0, then do something;
> >  2, While CPU0 is doing something, CPU1 set the flag to 1 with
> >     WRITE_ONCE(), and then wait the lock become to 1 with a READ_ONCE()
> >     loop;
> >  3, After CPU0 complete its work, it wait the flag become to 1, and if
> >     so then set the lock to 1;
> >  4, If the lock becomes to 1, CPU1 will leave the READ_ONCE() loop.

Are there specific loops in the kernel whose conditions are controlled
by READ_ONCE() that don't contain cpu_relax(), smp_mb(), etc.?  One
way to find them given your description of your hardware is to make
cpu_relax() be smp_mb() as Peter suggests, and then run tests to find
the problems.

Or have you already done this?

							Thanx, Paul

> > If without SFB, everything is OK. But with SFB in step 2, a
> > READ_ONCE() loop is right after WRITE_ONCE(), which makes the flag
> > cached in SFB (so be invisible by other CPUs) for ever, then both CPU0
> > and CPU1 wait for ever.
> 
> Sure.. we all got that far. And no, this isn't the _real_ problem. This
> is a manifestation of the problem.
> 
> The problem is that your SFB is broken (per the Linux requirements). We
> require that stores will become visible. That is, they must not
> indefinitely (for whatever reason) stay in the store buffer.
> 
> > I don't think this is a hardware bug, in design, SFB will flushed to
> > L1 cache in three cases:
> 
> > 1, data in SFB is full (be a complete cache line);
> > 2, there is a subsequent read access in the same cache line;
> > 3, a 'sync' instruction is executed.
> 
> And I think this _is_ a hardware bug. You just designed the bug instead
> of it being by accident.
> 
> > In this case, there is no other memory access (read or write) between
> > WRITE_ONCE() and READ_ONCE() loop. So Case 1 and Case 2 will not
> > happen, and the only way to make the flag be visible is wbflush
> > (wbflush is sync in Loongson's case).
> >
> > I think this problem is not only happens on Loongson, but will happen
> > on other CPUs which have write buffer (unless the write buffer has a
> > 4th case to be flushed).
> 
> It doesn't happen an _any_ other architecture except that dodgy
> ARM11MPCore part. Linux hard relies on stores to become available
> _eventually_.
> 
> Still, even with the rules above, the best work-around is still the very
> same cpu_relax() hack.