On Mon, Feb 13, 2023 at 07:36:42PM -0500, Joel Fernandes wrote: > Thanks, I agree with most of your last email, just replying to one thing: > > > > ->rf does because of data flow causality, ->ppo does because of > > > program structure, so that makes sense to be ->hb. > > > > > > IMHO, ->rfi should as well, because it is embodying a flow of data, so > > > that is a bit confusing. It would be great to clarify more perhaps > > > with an example about why ->rfi cannot be ->hb, in the > > > "happens-before" section. > > > > Maybe. We do talk about store forwarding, and in fact the ppo section > > already says: > > > > ------------------------------------------------------------------------ > > R ->dep W ->rfi R', > > > > where the dep link can be either an address or a data dependency. In > > this situation we know it is possible for the CPU to execute R' before > > W, because it can forward the value that W will store to R'. > > ------------------------------------------------------------------------ > > Thank you for pointing this out! In the text that follows this, in > this paragraph: > > <quote> > where the dep link can be either an address or a data dependency. In > this situation we know it is possible for the CPU to execute R' before > W, because it can forward the value that W will store to R'. But it > cannot execute R' before R, because it cannot forward the value before > it knows what that value is, or that W and R' do access the same > location. > </quote> > > The "in this situation" should be clarified that the "situation" is a > data-dependency. Only in the case of data-dependency, the ->rfi > cannot cause misordering if I understand it correctly. However, that > sentence does not mention data-dependency explicitly. Or let me know > if I missed something? The text explicitly says that the dep link can be either an address or a data dependency. In either case, R' cannot be reordered before R. In theory this doesn't have to be true for address dependencies, because the CPU might realize that W and R' access the same address without knowing what that address is. However, I've been reliably informed that no existing architectures do this sort of optimization. The case of a control dependency is different, because the CPU can speculate that W will be executed and can speculatively forward the value from W to R' before it knows what value R will read. Alan