Re: Page faults in tracepoint caused by aliased pointer

Hou Tao <houtao@xxxxxxxxxxxxxxx> · Thu, 22 Feb 2024 17:27:01 +0800

Hi,

On 2/13/2024 8:33 AM, Kumar Kartikeya Dwivedi wrote:
> On Tue, 13 Feb 2024 at 01:21, Yan Zhai <yan@xxxxxxxxxxxxxx> wrote:
>> On Mon, Feb 12, 2024 at 5:52 PM Alexei Starovoitov
>> <alexei.starovoitov@xxxxxxxxx> wrote:
>>> On Mon, Feb 12, 2024 at 3:42 PM Kumar Kartikeya Dwivedi
>>> <memxor@xxxxxxxxx> wrote:
>>>> On Tue, 13 Feb 2024 at 00:34, Alexei Starovoitov
>>>> <alexei.starovoitov@xxxxxxxxx> wrote:
>>>>> On Mon, Feb 12, 2024 at 3:16 PM Ignat Korchagin <ignat@xxxxxxxxxxxxxx> wrote:
>>>>>> [288931.217143][T109754] CPU: 4 PID: 109754 Comm: bpftrace Not tainted
>>>>>> 6.6.16+ #10
>>>>> ...
>>>>>> [288931.217143][T109754]  ? copy_from_kernel_nofault+0x1d/0xe0
>>>>>> [288931.217143][T109754]  bpf_probe_read_compat+0x6a/0x90
>>>>>>
>>>>>> And Jakub CCed here did it for 6.8.0-rc2+
>>>>> I suspect something is broken in your kernels.
>>>>> Above is doing generic copy_from_kernel_nofault(),
>>>>> so one should be able to crash the kernel without any bpf.
>>>>>
>>>>> We have this in selftests/bpf:
>>>>> __weak noinline struct file *bpf_testmod_return_ptr(int arg)
>>>>> {
>>>>>         static struct file f = {};
>>>>>
>>>>>         switch (arg) {
>>>>>         case 1: return (void *)EINVAL;          /* user addr */
>>>>>         case 2: return (void *)0xcafe4a11;      /* user addr */
>>>>>         case 3: return (void *)-EINVAL;         /* canonical, but invalid */
>>>>>         case 4: return (void *)(1ull << 60);    /* non-canonical and invalid */
>>>>>         case 5: return (void *)~(1ull << 30);   /* trigger extable */
>>>>>         case 6: return &f;                      /* valid addr */
>>>>>         case 7: return (void *)((long)&f | 1);  /* kernel tricks */
>>>>>         default: return NULL;
>>>>>         }
>>>>> }
>>>>> where we check that extables setup by JIT for bpf progs are working correctly.
>>>>> You should see the kernel crashing when you just run bpf selftests.
>>>> I agree, this appears unrelated to BPF since it is happening when
>>>> using copy_from_kernel_nofault (which should be jumping to the Efault
>>>> label instead of the oops), but I think it's not specific to some
>>>> custom kernel. I can reproduce it on my dev machine on top of bpf-next
>>>> as well, and another machine with Ubuntu's generic 6.5 kernel for
>>>> 24.04. And I think Ignat tried it on the mainline 6.8-rc2 as well.
>> copy_from_kernel_nofault is called in Jakub's reproducer, but the
>> panic case in our production seems to be direct memory accessing
>> according to bpftool dumped jited code. Will faults from such
>> instructions also be caught correctly?
>>
> Yep, since faults in both cases end up in the page fault handler.
> Once the fix pointed out by Alexei is applied, it should address both scenarios.

I didn't get the idea on how the vsyscall patch [1] will fix the
unhandled page fault caused by BTF pointer dereference. In my
understanding, for BTF pointer dereference, x86 JIT checks whether the
address is a kernel space address or not. If it is the kernel space
address, it will setup an exception fix-up entry for its dereference and
will try to do dereference directly. If the address is vsyscall address,
x86 JIT will consider it as kernel space address and will try to
dereference it directly. The dereference of vsyscall page in kernel will
trigger the page fault, handle_page_fault() will be invoked and it will
invoke do_user_addr_fault() and page_fault_oops() accordingly.

[1]:
https://patchwork.kernel.org/project/netdevbpf/patch/20240202103935.3154011-3-houtao@xxxxxxxxxxxxxxx/

>
>> Yan
>>
>>> Then it must be vsyscall address that this series are fixing:
>>> https://patchwork.kernel.org/project/netdevbpf/patch/20240202103935.3154011-3-houtao@xxxxxxxxxxxxxxx/
>>>
>>> We're still waiting on x86 maintainers to ack them.
> .