Re: [PATCH 1/2] MIPS: Loongson, add sync before target of branch between llsc

huangpei <huangpei@xxxxxxxxxxx> · Sat, 12 Jan 2019 16:19:55 +0800



On Sat, 12 Jan 2019 16:02:40 +0800 (GMT+08:00)
徐成华 <xuchenghua@xxxxxxxxxxx> wrote:

> > > For Loongson 3A1000 and 3A3000, when a memory access instruction
> > > (load, store, or prefetch)'s executing occurs between the
> > > execution of LL and SC, the success or failure of SC is not
> > > predictable.  Although programmer would not insert memory access
> > > instructions between LL and SC, the memory instructions before LL
> > > in program-order, may dynamically executed between the execution
> > > of LL/SC, so a memory fence(SYNC) is needed before LL/LLD to
> > > avoid this situation.
> > > 
> > > Since 3A3000, we improved our hardware design to handle this case.
> > > But we later deduce a rarely circumstance that some speculatively
> > > executed memory instructions due to branch misprediction between
> > > LL/SC still fall into the above case, so a memory fence(SYNC) at
> > > branch-target(if its target is not between LL/SC) is needed for
> > > 3A1000 and 3A3000.  
> > 
> > Thank you - that description is really helpful.
> > 
> > I have a few follow-up questions if you don't mind:
> > 
> >  1) Is it correct to say that the only consequence of the bug is
> > that an SC might fail when it ought to have succeeded?  

here is an example:

both cpu1 and cpu2 simutaneously run atomic_add by 1 on same
variable,  this bug cause both sc run  by two cpus (in atomic_add)
succeed at same time( sc return 1), and the variable is only added by 1,
which is wrong and unacceptable.( it should be added by 2)

I think sc do it wrong, instead of failing to to it;

> 
> Unfortunately, the SC succeeded when it should fail that cause a
> functional error. 
> >  2) Does that mean placing a sync before the LL is purely a
> > performance optimization? ie. if we don't have the sync & the SC
> > fails then we'll retry the LL/SC anyway, and this time not have the
> > reordered instruction from before the LL to cause a problem.  
> 
> It's functional bug not performance bug.
> 
> >  3) In the speculative execution case would it also work to place a
> > sync before the branch instruction, instead of at the branch
> > target? In some cases this might be nicer since the workaround
> > would be contained within the LL/SC loop, but I guess it could
> > potentially add more overhead if the branch is conditional & not
> > taken.  
> 
> Yes, it more overhead so we don't use that.


> 
> >  4) When we talk about branches here, is it really just branch
> >     instructions that are affected or will the CPU speculate past
> > jump instructions too?  
>  
> No, bug only expose when real program-order is still ll/sc,
> unconditional branch or jump is not really ll/sc, so it not affected.
> 
> > I just want to be sure that we work around this properly, and
> > document it in the kernel so that it's clear to developers why the
> > workaround exists & how to avoid introducing bugs for these CPUs in
> > future. 
> > > Our processor is continually evolving and we aim to to remove all
> > > these workaround-SYNCs around LL/SC for new-come processor.   
> > 
> > I'm very glad to hear that :)
> > 
> > I hope one day I can get my hands on a nice Loongson laptop to test
> > with.  
> 
> We can ship one to you as a gift when the laptop is stable.
> 
> > Thanks,
> >     Paul  
> 
> 
> --
> 
> 
> 
> 
>