Hello, On Tue, Jul 10, 2018 at 02:17:27PM +0200, Peter Zijlstra wrote: > > 1, CPU0 set the lock to 0, then do something; > > 2, While CPU0 is doing something, CPU1 set the flag to 1 with > > WRITE_ONCE(), and then wait the lock become to 1 with a READ_ONCE() > > loop; > > 3, After CPU0 complete its work, it wait the flag become to 1, and if > > so then set the lock to 1; > > 4, If the lock becomes to 1, CPU1 will leave the READ_ONCE() loop. > > > If without SFB, everything is OK. But with SFB in step 2, a > > READ_ONCE() loop is right after WRITE_ONCE(), which makes the flag > > cached in SFB (so be invisible by other CPUs) for ever, then both CPU0 > > and CPU1 wait for ever. > > Sure.. we all got that far. And no, this isn't the _real_ problem. This > is a manifestation of the problem. > > The problem is that your SFB is broken (per the Linux requirements). We > require that stores will become visible. That is, they must not > indefinitely (for whatever reason) stay in the store buffer. For the record, my understanding is that this doesn't really comply with the MIPS architecture either. It doesn't cover store buffers explicitly but does cover the more general notion of caches. >From section 2.8.8.2 "Cached Memory Access" of the Introduction to the MIPS64 Architecture document, revision 5.04 [1]: > In a cached access, physical memory and all caches in the system > containing a copy of the physical location are used to resolve the > access. A copy of a location is coherent if the copy was placed in the > cache by a cached coherent access; a copy of a location is noncoherent > if the copy was placed in the cache by a cached noncoherent access. > (Coherency is dictated by the system architecture, not the processor > implementation.) > > Caches containing a coherent copy of the location are examined and/or > modified to keep the contents of the location coherent. It is not > possible to predict whether caches holding a noncoherent copy of the > location will be examined and/or modified during a cached coherent > access. All of the accesses Linux is performing are cached coherent. I'm not sure which is the intent (I can ask if someone's interested), but you could either: 1) Consider the store buffer a cache, in which case loads need to check all store buffers from all CPUs because of the "all caches" part of the first quoted sentence. or 2) Decide store buffers aren't covered by the MIPS architecture documentation at all in which case the only sane thing to do would be to make it transparent to software (and here Loongson's isn't). > > I don't think this is a hardware bug, in design, SFB will flushed to > > L1 cache in three cases: > > > 1, data in SFB is full (be a complete cache line); > > 2, there is a subsequent read access in the same cache line; > > 3, a 'sync' instruction is executed. > > And I think this _is_ a hardware bug. You just designed the bug instead > of it being by accident. Presuming that the data in the SFB isn't visible to other cores, and presuming that case 2 is only covering loads from the same core that contains the SFB, I agree that this doesn't seem like valid behavior. Thanks, Paul [1] https://www.mips.com/?do-download=introduction-to-the-mips64-architecture-v5-04