Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Fri, 22 Feb 2019 15:16:35 -0800

On Fri, Feb 22, 2019 at 2:51 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> That's all fine. I'm missing rationale for making probe_kernel_read()
> fail on user addresses.

Because it already WON'T WORK in general!

> What is fundamentally wrong with a function probe_any_address_read() ?

What part of "the same pointer value can be a user address and a
kernel address" is not getting through?

The user address space and the kernel address space have separate page
tables on some architectures. We used to avoid it on x86, because
switching address spaces was expensive, but even on x86 some vendors
did it on 32-bit simply to get 4GB of user (and kernel) address space.
And now we end up doing it anyway just because of meltdown.

So a kernel pointer value of 0x12345678 could be a value kernel
pointer pointing to some random kmalloc'ed kernel memory, and a user
pointer value of 0x12345678 could be a valid _user_ pointer pointing
to some user mapping.

See?

If you access a user pointer, you need to use a user accessor function
(eg "get_user()"), while if you access a kernel pointer you need to
just dereference it directly (unless you can't trust it, in which case
you need to use a _different_ accessor function).

The fact that user and kernel pointers happen to be distinct on x86-64
(right now) is just a random implementation detail.

Really.

I didn't realize how many people seem to have been confused about
this. But it's always been true. It's just that the common
architectures have had that "one single address space for both kernel
and user pointers" in practice.

In fact, the *very* first kernel version had separate address spaces
for kernel and user mode even on x86 (using segments, not paging). So
it has literally been true since day one in Linux that a kernel
address can be indistinguishable from a user address from a pure value
standpoint.

                 Linus