Nicholas Piggin <npiggin@xxxxxxxxx> writes: > The rfscv instruction does not work correctly with the fake-suspend mode > in POWER9, which can end up with the hypervisor restoring an incorrect > checkpoint. If I understand correctly from commit 4bb3c7a0208f ("KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9"), this is because rfscv does not cause a soft-patch interrupt in the way that rfid etc do. So we need to avoid calling rfscv if we are in fake-suspend state - instead we must call something that does indeed get soft-patched - like rfid. > Work around this by setting the _TIF_RESTOREALL flag if a system call > returns to a transaction active state, causing rfid to be used instead > of rfscv to return, which will do the right thing. The contents of the > registers are irrelevant because they will be overwritten in this case > anyway. I can follow that this will indeed cause syscall_exit_prepare to return non-zero and therefore we should take the syscall_vectored_*_restore_regs path which does an RFID_TO_USER rather than a RFSCV_TO_USER. My only question/concern is: .Lsyscall_vectored_\name\()_exit: addi r4,r1,STACK_FRAME_OVERHEAD li r5,1 /* scv */ bl syscall_exit_prepare <-------- we get r3 != 0 here std r1,PACA_EXIT_SAVE_R1(r13) /* save r1 for restart */ .Lsyscall_vectored_\name\()_rst_start: lbz r11,PACAIRQHAPPENED(r13) andi. r11,r11,(~PACA_IRQ_HARD_DIS)@l bne- syscall_vectored_\name\()_restart <-- can we end up taking this branch? Are there any circumstances that would take us down the _restart path, and if so, will we still go through the correct RFID_TO_USER branch rather than the RFSCV_TO_USER branch? Apart from that this looks good to me, although with the heavy disclaimer that I only learned about fake suspend for the first time while reviewing the patch. Kind regards, Daniel > > Reported-by: Eirik Fuller <efuller@xxxxxxxxxx> > Fixes: 7fa95f9adaee7 ("powerpc/64s: system call support for scv/rfscv instructions") > Signed-off-by: Nicholas Piggin <npiggin@xxxxxxxxx> > --- > arch/powerpc/kernel/interrupt.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c > index c77c80214ad3..917a2ac4def6 100644 > --- a/arch/powerpc/kernel/interrupt.c > +++ b/arch/powerpc/kernel/interrupt.c > @@ -139,6 +139,19 @@ notrace long system_call_exception(long r3, long r4, long r5, > */ > irq_soft_mask_regs_set_state(regs, IRQS_ENABLED); > > + /* > + * If system call is called with TM active, set _TIF_RESTOREALL to > + * prevent RFSCV being used to return to userspace, because POWER9 > + * TM implementation has problems with this instruction returning to > + * transactional state. Final register values are not relevant because > + * the transaction will be aborted upon return anyway. Or in the case > + * of unsupported_scv SIGILL fault, the return state does not much > + * matter because it's an edge case. > + */ > + if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) && > + unlikely(MSR_TM_TRANSACTIONAL(regs->msr))) > + current_thread_info()->flags |= _TIF_RESTOREALL; > + > /* > * If the system call was made with a transaction active, doom it and > * return without performing the system call. Unless it was an > -- > 2.23.0