On Thu, Mar 04, 2021 at 11:11:42AM -0500, Alan Stern wrote: > On Thu, Mar 04, 2021 at 02:33:32PM +0800, Boqun Feng wrote: > > > Right, I was thinking about something unrelated.. but how about the > > following case: > > > > local_v = &y; > > r1 = READ_ONCE(*x); // f > > > > if (r1 == 1) { > > local_v = &y; // e > > } else { > > local_v = &z; // d > > } > > > > p = READ_ONCE(local_v); // g > > > > r2 = READ_ONCE(*p); // h > > > > if r1 == 1, we definitely think we have: > > > > f ->ctrl e ->rfi g ->addr h > > > > , and if we treat ctrl;rfi as "to-r", then we have "f" happens before > > "h". However compile can optimze the above as: > > > > local_v = &y; > > > > r1 = READ_ONCE(*x); // f > > > > if (r1 != 1) { > > local_v = &z; // d > > } > > > > p = READ_ONCE(local_v); // g > > > > r2 = READ_ONCE(*p); // h > > > > , and when this gets executed, I don't think we have the guarantee we > > have "f" happens before "h", because CPU can do optimistic read for "g" > > and "h". > > In your example, which accesses are supposed to be to actual memory and > which to registers? Also, remember that the memory model assumes the Given that we use READ_ONCE() on local_v, local_v should be a memory location but only accessed by this thread. > hardware does not reorder loads if there is an address dependency > between them. > Right, so "g" won't be reordered after "h". > > Part of this is because when we take plain access into consideration, we > > won't guarantee a read-from or other relations exists if compiler > > optimization happens. > > > > Maybe I'm missing something subtle, but just try to think through the > > effect of making dep; rfi as "to-r". > > Forget about local variables for the time being and just consider > > dep ; [Plain] ; rfi > > For example: > > A: r1 = READ_ONCE(x); > y = r1; > B: r2 = READ_ONCE(y); > > Should B be ordered after A? I don't see how any CPU could hope to > excute B before A, but maybe I'm missing something. > Agreed. > There's another twist, connected with the fact that herd7 can't detect > control dependencies caused by unexecuted code. If we have: > > A: r1 = READ_ONCE(x); > if (r1) > WRITE_ONCE(y, 5); > r2 = READ_ONCE(y); > B: WRITE_ONCE(z, r2); > > then in executions where x == 0, herd7 doesn't see any control > dependency. But CPUs do see control dependencies whenever there is a > conditional branch, whether the branch is taken or not, and so they will > never reorder B before A. > Right, because B in this example is a write, what if B is a read that depends on r2, like in my example? Let y be a pointer to a memory location, and initialized as a valid value (pointing to a valid memory location) you example changed to: A: r1 = READ_ONCE(x); if (r1) WRITE_ONCE(y, 5); C: r2 = READ_ONCE(y); B: r3 = READ_ONCE(*r2); , then A don't have the control dependency to B, because A and B is read+read. So B can be ordered before A, right? > One last thing to think about: My original assessment or Björn's problem > wasn't right, because the dep in (dep ; rfi) doesn't include control > dependencies. Only data and address. So I believe that the LKMM Ah, right. I was mising that part (ctrl is not in dep). So I guess my example is pointless for the question we are discussing here ;-( > wouldn't consider A to be ordered before B in this example even if x > was nonzero. Yes, and similar to my example (changing B to a read). I did try to run my example with herd, and got confused no matter I make dep; [Plain]; rfi as to-r (I got the same result telling me a reorder can happen). Now the reason is clear, because this is a ctrl; rfi not a dep; rfi. Thanks so much for walking with me on this ;-) Regards, Boqun > > Alan