Hello, On Fri, Jan 11, 2019 at 08:40:49PM +0800, 徐成华 wrote: > For Loongson 3A1000 and 3A3000, when a memory access instruction > (load, store, or prefetch)'s executing occurs between the execution of > LL and SC, the success or failure of SC is not predictable. Although > programmer would not insert memory access instructions between LL and > SC, the memory instructions before LL in program-order, may > dynamically executed between the execution of LL/SC, so a memory > fence(SYNC) is needed before LL/LLD to avoid this situation. > > Since 3A3000, we improved our hardware design to handle this case. > But we later deduce a rarely circumstance that some speculatively > executed memory instructions due to branch misprediction between LL/SC > still fall into the above case, so a memory fence(SYNC) at > branch-target(if its target is not between LL/SC) is needed for 3A1000 > and 3A3000. Thank you - that description is really helpful. I have a few follow-up questions if you don't mind: 1) Is it correct to say that the only consequence of the bug is that an SC might fail when it ought to have succeeded? 2) Does that mean placing a sync before the LL is purely a performance optimization? ie. if we don't have the sync & the SC fails then we'll retry the LL/SC anyway, and this time not have the reordered instruction from before the LL to cause a problem. 3) In the speculative execution case would it also work to place a sync before the branch instruction, instead of at the branch target? In some cases this might be nicer since the workaround would be contained within the LL/SC loop, but I guess it could potentially add more overhead if the branch is conditional & not taken. 4) When we talk about branches here, is it really just branch instructions that are affected or will the CPU speculate past jump instructions too? I just want to be sure that we work around this properly, and document it in the kernel so that it's clear to developers why the workaround exists & how to avoid introducing bugs for these CPUs in future. > Our processor is continually evolving and we aim to to remove all > these workaround-SYNCs around LL/SC for new-come processor. I'm very glad to hear that :) I hope one day I can get my hands on a nice Loongson laptop to test with. Thanks, Paul