On Wed, May 17, 2023 at 8:29 AM Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > > On 5/17/23 08:21, Jeff Xu wrote: > >>> I’m not sure I follow the details, can you give an example of an asynchronous > >>> mechanism to do this? E.g. would this be the kernel writing to the memory in a > >>> syscall for example? > >> I was thinking of all of the IORING_OP_*'s that can write to memory or > >> aio(7). > > IORING is challenging from security perspectives, for now, it is > > disabled in ChromeOS. Though I'm not sure how aio is related ? > > Let's say you're the attacking thread and you're the *only* attacking > thread. You have three things at your disposal: > > 1. A benign thread doing aio_read() > 2. An arbitrary write primitive > 3. You can send signals to yourself > 4. You can calculate where your signal stack will be > > You calculate the address of PKRU on the future signal stack. You then > leverage the otherwise benign aio_write() to write a 0 to that PKRU > location. Then, send a signal to yourself. The attacker's PKRU value > will be written to the stack. If you can time it right, the AIO will > complete while the signal handler is in progress and PKRU is on the > stack. On sigreturn, the kernel restores the aio_read()-placed, > attacker-provided PKRU value. Now the attacker has PKRU==0. It > effectively build a WRPKRU primitive out of those other pieces. > > Ah, I understand the question now, thanks for the explanation. Signalling handling is the next project that I will be working on. I'm leaning towards saving PKRU register to the thread struct, similar to how context switch works. This will address the attack scenario you described. However, there are a few challenges I have not yet worked through. First, the code needs to track when the first signaling entry occurs (saving the PKRU register to the thread struct) and when it is last returned (restoring the PKRU register from the thread struct). One way to do this would be to add another member to the thread struct to track the level of signaling re-entry. Second, signal is used in error handling, including the kernel's own signaling handling code path. I haven't worked through this part of code logic completely. If the first approach is too complicated or considered intrusive, I could take a different approach. In this approach, I would not track signaling re-entry. Instead, I would modify the PKRU saved in AltStack during handling of the signal, the steps are: a> save PKRU to tmp variable. b> modify PKRU to allow writing to the PKEY protected AltStack c> XSAVE. d> write tmp to the memory address of PKRU in AltStack at the correct offset. Since the thread's PKRU is saved to stack, XRSTOR will restore the thread's original PKRU during sigreturn in normal situations. This approach might be a little hacky because it overwrites XSAVE results. If we go with this route, I need someone's help on the overwriting function, it is CPU specific. However this approach will not work if an attacker can install its own signaling handling (therefore gains the ability to overwrite PKRU stored in stack, as you described), the application will want to install all the signaling handling with PKEY protected AltStack at startup time, and disallow additional signaling handling after that, this is programmatically achievable in V8, as Stephan mentioned. I would appreciate getting more comments in the signaling handling area on those two approaches, or are there better ways to do what we want? Do you think we could continue signaling handling discussion from the original thread that Kees started [1] ? There were already lots of discussions there about signalling handling, so it will be easier for future readers to understand the context. I can repost there. Or I can start a new thread for signaling handling, I'm worried that those discussions will get lengthy and context get lost with patch version update. Although the signaling handling project is related, I think VMA protection using the PKRU project can stand on its own. We could solve this for V8 first then move next to Signaling handling, the work here could also pave the way to add mseal() in future, I expect lots of code logic will be similar. [1] https://lore.kernel.org/all/202208221331.71C50A6F@keescook/ Thanks! Best regards, -Jeff Xu On Wed, May 17, 2023 at 8:29 AM Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > > On 5/17/23 08:21, Jeff Xu wrote: > >>> I’m not sure I follow the details, can you give an example of an asynchronous > >>> mechanism to do this? E.g. would this be the kernel writing to the memory in a > >>> syscall for example? > >> I was thinking of all of the IORING_OP_*'s that can write to memory or > >> aio(7). > > IORING is challenging from security perspectives, for now, it is > > disabled in ChromeOS. Though I'm not sure how aio is related ? > > Let's say you're the attacking thread and you're the *only* attacking > thread. You have three things at your disposal: > > 1. A benign thread doing aio_read() > 2. An arbitrary write primitive > 3. You can send signals to yourself > 4. You can calculate where your signal stack will be > > You calculate the address of PKRU on the future signal stack. You then > leverage the otherwise benign aio_write() to write a 0 to that PKRU > location. Then, send a signal to yourself. The attacker's PKRU value > will be written to the stack. If you can time it right, the AIO will > complete while the signal handler is in progress and PKRU is on the > stack. On sigreturn, the kernel restores the aio_read()-placed, > attacker-provided PKRU value. Now the attacker has PKRU==0. It > effectively build a WRPKRU primitive out of those other pieces. > >