On Sat, 12 Jan 2019 16:02:40 +0800 (GMT+08:00) 徐成华 <xuchenghua@xxxxxxxxxxx> wrote: > > > For Loongson 3A1000 and 3A3000, when a memory access instruction > > > (load, store, or prefetch)'s executing occurs between the > > > execution of LL and SC, the success or failure of SC is not > > > predictable. Although programmer would not insert memory access > > > instructions between LL and SC, the memory instructions before LL > > > in program-order, may dynamically executed between the execution > > > of LL/SC, so a memory fence(SYNC) is needed before LL/LLD to > > > avoid this situation. > > > > > > Since 3A3000, we improved our hardware design to handle this case. > > > But we later deduce a rarely circumstance that some speculatively > > > executed memory instructions due to branch misprediction between > > > LL/SC still fall into the above case, so a memory fence(SYNC) at > > > branch-target(if its target is not between LL/SC) is needed for > > > 3A1000 and 3A3000. > > > > Thank you - that description is really helpful. > > > > I have a few follow-up questions if you don't mind: > > > > 1) Is it correct to say that the only consequence of the bug is > > that an SC might fail when it ought to have succeeded? here is an example: both cpu1 and cpu2 simutaneously run atomic_add by 1 on same variable, this bug cause both sc run by two cpus (in atomic_add) succeed at same time( sc return 1), and the variable is only added by 1, which is wrong and unacceptable.( it should be added by 2) I think sc do it wrong, instead of failing to to it; > > Unfortunately, the SC succeeded when it should fail that cause a > functional error. > > 2) Does that mean placing a sync before the LL is purely a > > performance optimization? ie. if we don't have the sync & the SC > > fails then we'll retry the LL/SC anyway, and this time not have the > > reordered instruction from before the LL to cause a problem. > > It's functional bug not performance bug. > > > 3) In the speculative execution case would it also work to place a > > sync before the branch instruction, instead of at the branch > > target? In some cases this might be nicer since the workaround > > would be contained within the LL/SC loop, but I guess it could > > potentially add more overhead if the branch is conditional & not > > taken. > > Yes, it more overhead so we don't use that. > > > 4) When we talk about branches here, is it really just branch > > instructions that are affected or will the CPU speculate past > > jump instructions too? > > No, bug only expose when real program-order is still ll/sc, > unconditional branch or jump is not really ll/sc, so it not affected. > > > I just want to be sure that we work around this properly, and > > document it in the kernel so that it's clear to developers why the > > workaround exists & how to avoid introducing bugs for these CPUs in > > future. > > > Our processor is continually evolving and we aim to to remove all > > > these workaround-SYNCs around LL/SC for new-come processor. > > > > I'm very glad to hear that :) > > > > I hope one day I can get my hands on a nice Loongson laptop to test > > with. > > We can ship one to you as a gift when the laptop is stable. > > > Thanks, > > Paul > > > -- > > > > >