On Sat, May 30, 2020 at 11:09:42PM +0800, laokz wrote: > Hi Paul, > > Many appreciation for your light! > > On 2020-05-30 Sat 05:43 -0700,Paul E. McKenney wrote: > > On Sat, May 30, 2020 at 06:36:37PM +0800, laokz wrote: > > > Hello Paul, > > > > > > This is a bit longer story, I am still searching and stuck in the mist:- > > > ) > > > Hope to get light from you. Thanks! > > > > > > I commented out smb_mb() from tools/memory-model/litmus- > > > tests/LB+fencembonceonce+ctrlonceonce.litmus. > > > > > > P0(int *x, int *y) > > > { > > > int r0; > > > > > > r0 = READ_ONCE(*x); > > > if (r0) > > > WRITE_ONCE(*y, 1); > > > } > > > > > > P1(int *x, int *y) > > > { > > > int r0; > > > > > > r0 = READ_ONCE(*y); > > > // smp_mb(); > > > WRITE_ONCE(*x, 1); > > > } > > > > > > And confused by that LKMM said it existed P0:r0=1 /\ P1:r0=1 > > > > > > I want to clear these questions: > > > > > > 1. Is there 'out-of-order commit/retirement' CPU among linux supported > > > architectures? If yes, which one? and then the following is trivial. > > > > The powerpc architecture allows prior reads to be reordered with > > subsequent writes. To see this, point your browser here: > > > > https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html#PPC > > > > And "Select POWER Test" LB -> ctrl+po. You will then have this: > > > > PC LB+ctrl+po > > "DpCtrldW Rfe PodRW Rfe" > > Cycle=Rfe PodRW Rfe DpCtrldW > > { > > 0:r2=x; 0:r4=y; > > 1:r2=y; 1:r4=x; > > } > > P0 | P1 ; > > lwz r1,0(r2) | lwz r1,0(r2) ; > > cmpw r1,r1 | li r3,1 ; > > beq LC00 | stw r3,0(r4) ; > > LC00: | ; > > li r3,1 | ; > > stw r3,0(r4) | ; > > exists > > (0:r1=1 /\ 1:r1=1) > > > > You then should be able to easily force the P0:r0=1 /\ P1:r0=1 after > > clicking on the "Interactive" button. (Hint: First commit Thread 1's > > "li" instruction, then its "stw" instruction, then all of Thread 0's > > instructions, and then Thread 1's remaining "lwz" instruction.) > > I followed your pointer. Yes, it showed the same result with my questioning > litmus test. > > > > 2. READ_ONCE, WRITE_ONCE assure compiler respect program order. > > > If P0:r0=1, then it must have observed P1 write to x(wall time ahead > > > P0:r0). > > > If P1 write to x happened(committed, so visible to outside), then its > > > read > > > from y must happened before, because cpu's in-order commit/retirement > > > restriction(wall time ahead P1:write). > > > Then how the most earlier P1:r0 to get value 1? > > > > On powerpc architecture, it can. But don't take my word for it, try > > it out on the website listed above. ;-) > > In this website, I got > https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/pldi105-sarkar.pdf. It gave > me a clue in section 8 page 11: > > >> Specifically: the model allows instructions to **commit out of program > order**, which permits the LB and LB+rs test outcomes (not observed in > practice);... > > It sounds resonable to me. Now I try to conclude: If the CPU was implemented > in-order commit, then my questioning test result(after comment out P1's > smb_mb) was forbidden. Can I? The CPUs are quite a bit more complicated than that, and there are a lot of ways that things can get out of order. One mechanism is, as you say, instruction commit order. Another is the store buffer. Yet another is invalidation queues. A third is the cache coherence protocol. Appendix C of "Is Parallel Programming Hard, And, If So, What Can You Do About It?" gives more details. https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html Thanx, Paul