Re: [PATCH 5.10.y] x86/mm: Remove broken vsyscall emulation code from the page fault code

Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> · Wed, 12 Jun 2024 15:57:01 +0200

On Sun, Jun 02, 2024 at 12:24:34PM -0300, Guilherme G. Piccoli wrote:
> From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> 
> commit 02b670c1f88e78f42a6c5aee155c7b26960ca054 upstream.
> 
> The syzbot-reported stack trace from hell in this discussion thread
> actually has three nested page faults:
> 
>   https://lore.kernel.org/r/000000000000d5f4fc0616e816d4@xxxxxxxxxx
> 
> ... and I think that's actually the important thing here:
> 
>  - the first page fault is from user space, and triggers the vsyscall
>    emulation.
> 
>  - the second page fault is from __do_sys_gettimeofday(), and that should
>    just have caused the exception that then sets the return value to
>    -EFAULT
> 
>  - the third nested page fault is due to _raw_spin_unlock_irqrestore() ->
>    preempt_schedule() -> trace_sched_switch(), which then causes a BPF
>    trace program to run, which does that bpf_probe_read_compat(), which
>    causes that page fault under pagefault_disable().
> 
> It's quite the nasty backtrace, and there's a lot going on.
> 
> The problem is literally the vsyscall emulation, which sets
> 
>         current->thread.sig_on_uaccess_err = 1;
> 
> and that causes the fixup_exception() code to send the signal *despite* the
> exception being caught.
> 
> And I think that is in fact completely bogus.  It's completely bogus
> exactly because it sends that signal even when it *shouldn't* be sent -
> like for the BPF user mode trace gathering.
> 
> In other words, I think the whole "sig_on_uaccess_err" thing is entirely
> broken, because it makes any nested page-faults do all the wrong things.
> 
> Now, arguably, I don't think anybody should enable vsyscall emulation any
> more, but this test case clearly does.
> 
> I think we should just make the "send SIGSEGV" be something that the
> vsyscall emulation does on its own, not this broken per-thread state for
> something that isn't actually per thread.
> 
> The x86 page fault code actually tried to deal with the "incorrect nesting"
> by having that:
> 
>                 if (in_interrupt())
>                         return;
> 
> which ignores the sig_on_uaccess_err case when it happens in interrupts,
> but as shown by this example, these nested page faults do not need to be
> about interrupts at all.
> 
> IOW, I think the only right thing is to remove that horrendously broken
> code.
> 
> The attached patch looks like the ObviouslyCorrect(tm) thing to do.
> 
> NOTE! This broken code goes back to this commit in 2011:
> 
>   4fc3490114bb ("x86-64: Set siginfo and context on vsyscall emulation faults")
> 
> ... and back then the reason was to get all the siginfo details right.
> Honestly, I do not for a moment believe that it's worth getting the siginfo
> details right here, but part of the commit says:
> 
>     This fixes issues with UML when vsyscall=emulate.
> 
> ... and so my patch to remove this garbage will probably break UML in this
> situation.
> 
> I do not believe that anybody should be running with vsyscall=emulate in
> 2024 in the first place, much less if you are doing things like UML. But
> let's see if somebody screams.
> 
> Reported-and-tested-by: syzbot+83e7f982ca045ab4405c@xxxxxxxxxxxxxxxxxxxxxxxxx
> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
> Tested-by: Jiri Olsa <jolsa@xxxxxxxxxx>
> Acked-by: Andy Lutomirski <luto@xxxxxxxxxx>
> Link: https://lore.kernel.org/r/CAHk-=wh9D6f7HUkDgZHKmDCHUQmp+Co89GP+b8+z+G56BKeyNg@xxxxxxxxxxxxxx
> Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
> [gpiccoli: Backport the patch due to differences in the trees. The main change
> between 5.10.y and 5.15.y is due to renaming the fixup function, by
> commit 6456a2a69ee1 ("x86/fault: Rename no_context() to kernelmode_fixup_or_oops()").
> 
> Following 2 commits cause divergence in the diffs too (in the removed lines):
> cd072dab453a ("x86/fault: Add a helper function to sanitize error code")
> d4ffd5df9d18 ("x86/fault: Fix wrong signal when vsyscall fails with pkey")
> 
> Finally, there is context adjustment in the processor.h file.]
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@xxxxxxxxxx>
> ---
> 
> 
> Hi folks, this was backported by AUTOSEL up to 5.15.y; I'm manually submitting
> the backport to 5.4.y and 5.10.y. I've detailed a bit the changes necessary
> due to other nonrelated missing patches, but these are really simple and
> non-intrusive. Nevertheless, I've explicitely CCed x86 ML to be sure the
> maintainers are aware of the backport, and if anybody thinks we shouldn't
> do it for these (very) old releases, please respond here.

Both now queued up, thanks.

greg k-h