Hi Laurent, I am looking to add support for speculative page fault handling to arm64 (effectively porting this patch) and had a few questions. Apologies if I've missed an obvious explanation for my queries. I'm jumping in bit late to the discussion. On Tue, Apr 17, 2018 at 3:33 PM, Laurent Dufour <ldufour@xxxxxxxxxxxxxxxxxx> wrote: > From: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > > Try a speculative fault before acquiring mmap_sem, if it returns with > VM_FAULT_RETRY continue with the mmap_sem acquisition and do the > traditional fault. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> > > [Clearing of FAULT_FLAG_ALLOW_RETRY is now done in > handle_speculative_fault()] > [Retry with usual fault path in the case VM_ERROR is returned by > handle_speculative_fault(). This allows signal to be delivered] > [Don't build SPF call if !CONFIG_SPECULATIVE_PAGE_FAULT] > [Try speculative fault path only for multi threaded processes] > [Try reuse to the VMA fetch during the speculative path in case of retry] > [Call reuse_spf_or_find_vma()] > [Handle memory protection key fault] > Signed-off-by: Laurent Dufour <ldufour@xxxxxxxxxxxxxxxxxx> > --- > arch/x86/mm/fault.c | 42 ++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 38 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > index 73bd8c95ac71..59f778386df5 100644 > --- a/arch/x86/mm/fault.c > +++ b/arch/x86/mm/fault.c > @@ -1220,7 +1220,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, > struct mm_struct *mm; > int fault, major = 0; > unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; > - u32 pkey; > + u32 pkey, *pt_pkey = &pkey; > > tsk = current; > mm = tsk->mm; > @@ -1310,6 +1310,30 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, > flags |= FAULT_FLAG_INSTRUCTION; > > /* > + * Do not try speculative page fault for kernel's pages and if > + * the fault was due to protection keys since it can't be resolved. > + */ > + if (IS_ENABLED(CONFIG_SPECULATIVE_PAGE_FAULT) && > + !(error_code & X86_PF_PK)) { You can simplify this condition by dropping the IS_ENABLED() check as you already provide an alternate implementation of handle_speculative_fault() when CONFIG_SPECULATIVE_PAGE_FAULT is not defined. > + fault = handle_speculative_fault(mm, address, flags, &vma); > + if (fault != VM_FAULT_RETRY) { > + perf_sw_event(PERF_COUNT_SW_SPF, 1, regs, address); > + /* > + * Do not advertise for the pkey value since we don't > + * know it. > + * This is not a matter as we checked for X86_PF_PK > + * earlier, so we should not handle pkey fault here, > + * but to be sure that mm_fault_error() callees will > + * not try to use it, we invalidate the pointer. > + */ > + pt_pkey = NULL; > + goto done; > + } > + } else { > + vma = NULL; > + } The else part can be dropped if vma is initialised to NULL when it is declared at the top of the function. > + > + /* > * When running in the kernel we expect faults to occur only to > * addresses in user space. All other faults represent errors in > * the kernel and should generate an OOPS. Unfortunately, in the > @@ -1342,7 +1366,8 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, > might_sleep(); > } > > - vma = find_vma(mm, address); > + if (!vma || !can_reuse_spf_vma(vma, address)) > + vma = find_vma(mm, address); Is there a measurable benefit from reusing the vma? Dropping the vma reference unconditionally after speculative page fault handling gets rid of the implicit state when "vma != NULL" (increased ref-count). I found it a bit confusing to follow. > if (unlikely(!vma)) { > bad_area(regs, error_code, address); > return; > @@ -1409,8 +1434,15 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, > if (flags & FAULT_FLAG_ALLOW_RETRY) { > flags &= ~FAULT_FLAG_ALLOW_RETRY; > flags |= FAULT_FLAG_TRIED; > - if (!fatal_signal_pending(tsk)) > + if (!fatal_signal_pending(tsk)) { > + /* > + * Do not try to reuse this vma and fetch it > + * again since we will release the mmap_sem. > + */ > + if (IS_ENABLED(CONFIG_SPECULATIVE_PAGE_FAULT)) > + vma = NULL; Regardless of the above comment, can the vma be reset here unconditionally? Thanks, Punit