On 1/26/21 1:15 AM, Peter Zijlstra wrote:
On Mon, Jan 25, 2021 at 04:12:19PM -0800, Yonghong Song wrote:
When the test is run normally after login prompt, cpu_feature_enabled(X86_FEATURE_SMAP)
is true and bad_area_nosemaphore() is called and then fixup_exception() is called,
where bpf specific handler is able to fixup the exception.
But when the test is run at /sbin/init time, cpu_feature_enabled(X86_FEATURE_SMAP) is
false, the control reaches
That makes no sense, cpu features should be set in stone long before we
reach userspace.
You are correct. Sorry for misleading description. The reason is I use
two qemu script, one from my local environment and the other from ci
test since they works respectively. I thought they should have roughly
the same kernel setup, but apparently they are not.
For my local qemu script, I have "-cpu host" option and with this,
X86_FEATURE_SMAP is set up probably in function get_cpu_cap(), file
arch/x86/kernel/cpu/common.c.
For CI qemu script (in
https://lore.kernel.org/bpf/20210123004445.299149-1-kpsingh@xxxxxxxxxx/),
the "-cpu kvm64" is the qemu argument. This cpu does not
enable X86_FEATURE_SMAP, so we will see the kernel warning.
Changing CI script to use "-cpu host" resolved the issue. I think "-cpu
kvm64" may emulate old x86 cpus and ignore X86_FEATURE_SMAP.
Do we have any x64 cpus which does not support X86_FEATURE_SMAP?
For CI, with "-cpu kvm64" is good as it can specify the number
of cores with e.g. "-smp 4" regardless of underlying host # of cores.
I think we could change CI to use "-cpu host" by ensuring the CI host
having at least 4 cores.
Thanks.
To fix the issue, before the above mmap_read_trylock(), we will check
whether fault ip can be served by bpf exception handler or not, if
yes, the exception will be fixed up and return.
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f1f1b5a0956a..e8182d30bf67 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1317,6 +1317,15 @@ void do_user_addr_fault(struct pt_regs *regs,
if (emulate_vsyscall(hw_error_code, regs, address))
return;
}
+
+#ifdef CONFIG_BPF_JIT
+ /*
+ * Faults incurred by bpf program might need emulation, i.e.,
+ * clearing the dest register.
+ */
+ if (fixup_bpf_exception(regs, X86_TRAP_PF, hw_error_code, address))
+ return;
+#endif
#endif
NAK, this is broken. You're now disallowing faults that should've gone
through.