On Sun, Jun 02, 2024 at 12:24:34PM -0300, Guilherme G. Piccoli wrote: > From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > > commit 02b670c1f88e78f42a6c5aee155c7b26960ca054 upstream. > > The syzbot-reported stack trace from hell in this discussion thread > actually has three nested page faults: > > https://lore.kernel.org/r/000000000000d5f4fc0616e816d4@xxxxxxxxxx > > ... and I think that's actually the important thing here: > > - the first page fault is from user space, and triggers the vsyscall > emulation. > > - the second page fault is from __do_sys_gettimeofday(), and that should > just have caused the exception that then sets the return value to > -EFAULT > > - the third nested page fault is due to _raw_spin_unlock_irqrestore() -> > preempt_schedule() -> trace_sched_switch(), which then causes a BPF > trace program to run, which does that bpf_probe_read_compat(), which > causes that page fault under pagefault_disable(). > > It's quite the nasty backtrace, and there's a lot going on. > > The problem is literally the vsyscall emulation, which sets > > current->thread.sig_on_uaccess_err = 1; > > and that causes the fixup_exception() code to send the signal *despite* the > exception being caught. > > And I think that is in fact completely bogus. It's completely bogus > exactly because it sends that signal even when it *shouldn't* be sent - > like for the BPF user mode trace gathering. > > In other words, I think the whole "sig_on_uaccess_err" thing is entirely > broken, because it makes any nested page-faults do all the wrong things. > > Now, arguably, I don't think anybody should enable vsyscall emulation any > more, but this test case clearly does. > > I think we should just make the "send SIGSEGV" be something that the > vsyscall emulation does on its own, not this broken per-thread state for > something that isn't actually per thread. > > The x86 page fault code actually tried to deal with the "incorrect nesting" > by having that: > > if (in_interrupt()) > return; > > which ignores the sig_on_uaccess_err case when it happens in interrupts, > but as shown by this example, these nested page faults do not need to be > about interrupts at all. > > IOW, I think the only right thing is to remove that horrendously broken > code. > > The attached patch looks like the ObviouslyCorrect(tm) thing to do. > > NOTE! This broken code goes back to this commit in 2011: > > 4fc3490114bb ("x86-64: Set siginfo and context on vsyscall emulation faults") > > ... and back then the reason was to get all the siginfo details right. > Honestly, I do not for a moment believe that it's worth getting the siginfo > details right here, but part of the commit says: > > This fixes issues with UML when vsyscall=emulate. > > ... and so my patch to remove this garbage will probably break UML in this > situation. > > I do not believe that anybody should be running with vsyscall=emulate in > 2024 in the first place, much less if you are doing things like UML. But > let's see if somebody screams. > > Reported-and-tested-by: syzbot+83e7f982ca045ab4405c@xxxxxxxxxxxxxxxxxxxxxxxxx > Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx> > Tested-by: Jiri Olsa <jolsa@xxxxxxxxxx> > Acked-by: Andy Lutomirski <luto@xxxxxxxxxx> > Link: https://lore.kernel.org/r/CAHk-=wh9D6f7HUkDgZHKmDCHUQmp+Co89GP+b8+z+G56BKeyNg@xxxxxxxxxxxxxx > Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx> > [gpiccoli: Backport the patch due to differences in the trees. The main change > between 5.10.y and 5.15.y is due to renaming the fixup function, by > commit 6456a2a69ee1 ("x86/fault: Rename no_context() to kernelmode_fixup_or_oops()"). > > Following 2 commits cause divergence in the diffs too (in the removed lines): > cd072dab453a ("x86/fault: Add a helper function to sanitize error code") > d4ffd5df9d18 ("x86/fault: Fix wrong signal when vsyscall fails with pkey") > > Finally, there is context adjustment in the processor.h file.] > Signed-off-by: Guilherme G. Piccoli <gpiccoli@xxxxxxxxxx> > --- > > > Hi folks, this was backported by AUTOSEL up to 5.15.y; I'm manually submitting > the backport to 5.4.y and 5.10.y. I've detailed a bit the changes necessary > due to other nonrelated missing patches, but these are really simple and > non-intrusive. Nevertheless, I've explicitely CCed x86 ML to be sure the > maintainers are aware of the backport, and if anybody thinks we shouldn't > do it for these (very) old releases, please respond here. Both now queued up, thanks. greg k-h