Re: [PATCH 0/6] Memory Mapping (VMA) protection using PKU - set 1

Jeff Xu <jeffxu@xxxxxxxxxx> · Wed, 17 May 2023 16:48:00 -0700

On Wed, May 17, 2023 at 8:29 AM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 5/17/23 08:21, Jeff Xu wrote:
> >>> I’m not sure I follow the details, can you give an example of an asynchronous
> >>> mechanism to do this? E.g. would this be the kernel writing to the memory in a
> >>> syscall for example?
> >> I was thinking of all of the IORING_OP_*'s that can write to memory or
> >> aio(7).
> > IORING is challenging from security perspectives, for now, it is
> > disabled in ChromeOS. Though I'm not sure how aio is related ?
>
> Let's say you're the attacking thread and you're the *only* attacking
> thread.  You have three things at your disposal:
>
>  1. A benign thread doing aio_read()
>  2. An arbitrary write primitive
>  3. You can send signals to yourself
>  4. You can calculate where your signal stack will be
>
> You calculate the address of PKRU on the future signal stack.  You then
> leverage the otherwise benign aio_write() to write a 0 to that PKRU
> location.  Then, send a signal to yourself.  The attacker's PKRU value
> will be written to the stack.  If you can time it right, the AIO will
> complete while the signal handler is in progress and PKRU is on the
> stack.  On sigreturn, the kernel restores the aio_read()-placed,
> attacker-provided PKRU value.  Now the attacker has PKRU==0.  It
> effectively build a WRPKRU primitive out of those other pieces.
>
>
Ah, I understand the question now, thanks for the explanation.
Signalling handling is the next project that I will be working on.

I'm leaning towards saving PKRU register to the thread struct, similar
to how context switch works. This will address the attack scenario you
described.
However, there are a few challenges I have not yet worked through.
First, the code needs to track when the first signaling entry occurs
(saving the PKRU register to the thread struct) and when it is last
returned (restoring the PKRU register from the thread struct). One way
to do this would be to add another member to the thread struct to
track the level of signaling re-entry. Second, signal is used in
error handling, including the kernel's own signaling handling code
path. I haven't worked through this part of code logic completely.

If the first approach is too complicated or considered intrusive,  I
could take a different approach. In this approach, I would not track
signaling re-entry. Instead, I would modify the PKRU saved in AltStack
during handling of the signal, the steps are:
a> save PKRU to tmp variable.
b> modify PKRU to allow writing to the PKEY protected AltStack
c> XSAVE.
d> write tmp to the memory address of PKRU in  AltStack at the
correct offset.
Since the thread's PKRU is saved to stack, XRSTOR will restore the
thread's original PKRU during sigreturn in normal situations. This
approach might be a little hacky because it overwrites XSAVE results.
If we go with this route, I need someone's help on the overwriting
function, it is CPU specific.
However this approach will not work if an attacker can install its own
signaling handling (therefore gains the ability to overwrite PKRU stored
in stack, as you described), the application will want to install all the
signaling handling with PKEY protected AltStack at startup time, and
disallow additional signaling handling after that, this is programmatically
achievable in V8, as Stephan mentioned.

I would appreciate getting more comments in the signaling handling
area on those two approaches, or are there  better ways to do what we
want? Do you think we could continue signaling handling discussion
from the original thread that Kees started [1] ? There were already
lots of discussions there about signalling handling,  so it will be
easier for future readers to understand the context. I can repost
there. Or I can start a new thread for signaling handling, I'm
worried that those discussions will get lengthy and context get lost
with patch version update.

Although the signaling handling project is related,  I think VMA
protection using the PKRU project can stand on its own. We could solve
this for V8 first then move next to Signaling handling, the work here
could also pave the way to add mseal() in future, I expect lots of
code logic will be similar.

[1] https://lore.kernel.org/all/202208221331.71C50A6F@keescook/

Thanks!
Best regards,
-Jeff Xu

On Wed, May 17, 2023 at 8:29 AM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 5/17/23 08:21, Jeff Xu wrote:
> >>> I’m not sure I follow the details, can you give an example of an asynchronous
> >>> mechanism to do this? E.g. would this be the kernel writing to the memory in a
> >>> syscall for example?
> >> I was thinking of all of the IORING_OP_*'s that can write to memory or
> >> aio(7).
> > IORING is challenging from security perspectives, for now, it is
> > disabled in ChromeOS. Though I'm not sure how aio is related ?
>
> Let's say you're the attacking thread and you're the *only* attacking
> thread.  You have three things at your disposal:
>
>  1. A benign thread doing aio_read()
>  2. An arbitrary write primitive
>  3. You can send signals to yourself
>  4. You can calculate where your signal stack will be
>
> You calculate the address of PKRU on the future signal stack.  You then
> leverage the otherwise benign aio_write() to write a 0 to that PKRU
> location.  Then, send a signal to yourself.  The attacker's PKRU value
> will be written to the stack.  If you can time it right, the AIO will
> complete while the signal handler is in progress and PKRU is on the
> stack.  On sigreturn, the kernel restores the aio_read()-placed,
> attacker-provided PKRU value.  Now the attacker has PKRU==0.  It
> effectively build a WRPKRU primitive out of those other pieces.
>
>