On Wed, May 2, 2018 at 2:06 PM Florian Weimer <fweimer@xxxxxxxxxx> wrote: > On 05/02/2018 10:41 PM, Andy Lutomirski wrote: > >> See above. The signal handler will crash if it calls any non-local > >> function through the GOT because with the default access rights, it's > >> not readable in the signal handler. > > > >> Any use of memory protection keys for basic infrastructure will run into > >> this problem, so I think the current kernel behavior is not very useful. > >> It's also x86-specific. > > > >> From a security perspective, the atomic behavior is not very useful > >> because you generally want to modify PKRU *before* computing the details > >> of the memory access, so that you don't have a general “poke anywhere > >> with this access right” primitive in the text segment. (I called this > >> the “suffix problem” in another context.) > > > > > > Ugh, right. It's been long enough that I forgot about the underlying > > issue. A big part of the problem here is that pkey_alloc() should set the > > initial value of the key across all threads, but it *can't*. There is > > literally no way to do it in a multithreaded program that uses RDPKRU and > > WRPKRU. > The kernel could do *something*, probably along the membarrier system > call. I mean, I could implement a reasonable close approximation in > userspace, via the setxid mechanism in glibc (but I really don't want to). I beg to differ. Thread A: old = RDPKRU(); WRPKRU(old & ~3); ... WRPKRU(old); Thread B: pkey_alloc(). If pkey_alloc() happens while thread A is in the ... part, you lose. It makes no difference what the kernel does. The problem is that the WRPKRU instruction itself is designed incorrectly. > > But I think the right fix, at least for your use case, is to have a per-mm > > init_pkru variable that starts as "deny all". We'd add a new pkey_alloc() > > flag like PKEY_ALLOC_UPDATE_INITIAL_STATE that causes the specified mode to > > update init_pkru. New threads and delivered signals would get the > > init_pkru state instead of the hardcoded default. > I implemented this for signal handlers: > https://marc.info/?l=linux-api&m=151285420302698&w=2 > This does not alter the thread inheritance behavior yet. I would have > to investigate how to implement that. > Feedback led to the current patch, though. I'm not sure what has > changed since then. What feedback? I think the old patch was much better than the new patch. I could point out some issues in the kernel code, and I think it should deal with thread creation, but otherwise I think it's the right approach. Keep in mind, though, that it's just not possible to make pkey_alloc() work on x86 in any sensible way in a multithreaded program. > If I recall correctly, the POWER maintainer did express a strong desire > back then for (what is, I believe) their current semantics, which my > PKEY_ALLOC_SIGNALINHERIT patch implements for x86, too. Ram, I really really don't like the POWER semantics. Can you give some justification for them? Does POWER at least have an atomic way for userspace to modify just the key it wants to modify or, even better, special load and store instructions to use alternate keys? --Andy