On Wed, May 2, 2018 at 3:08 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > On 05/02/2018 02:23 PM, Andy Lutomirski wrote: > >> The kernel could do *something*, probably along the membarrier system > >> call. I mean, I could implement a reasonable close approximation in > >> userspace, via the setxid mechanism in glibc (but I really don't want to). > > > > I beg to differ. > > > > Thread A: > > old = RDPKRU(); > > WRPKRU(old & ~3); > > ... > > WRPKRU(old); > > > > Thread B: > > pkey_alloc(). > > > > If pkey_alloc() happens while thread A is in the ... part, you lose. It > > makes no difference what the kernel does. The problem is that the WRPKRU > > instruction itself is designed incorrectly. > Yes, *if* we define pkey_alloc() to be implicitly changing other > threads' PKRU value. I think that's the only generally useful behavior. In general, if libraries are going to use protection keys, they're going to do this: int key = pkey_alloc(...); mmap() or mprotect_key() some memory (for crypto secrets, a database, whatever). ... enable_write_access(key); write to the memory; disable_write_access(key); That library wants other threads, signal handlers, and, in general, the whole rest of the process to be restricted, and that library doesn't want race conditions. The problem here is that, to get this right, we either need the PKRU modifications to be syscalls or to take locks, and the lock approach is going to be fairly gross. > Let's say we go to the hardware guys and ask for a new instruction to > fix this. We're going to have to make a pretty good case that this is > either impossible or really hard to do in software. > Surely we have the locking to tell another thread that we want its PKRU > value to change without actively going out and having the kernel poke a > new value in. I think that doing it in userspace without a new instruction is going to be slow, ugly, unreliable, or more than one of the above. Here's my proposal. We ask the hardware folks for new instructions. Once we get tentative agreement, we add new vDSO pkey accessors. At first those accessors function will do a syscall. When we get the new instructions, they get alternatives. The vDSO helpers could look like this: typedef struct { u64 opaque; } pkey_save_t; pkey_save_t pkey_save_and_set(int key, unsigned int mode); void pkey_set(int key, unsigned int mode); void pkey_restore(int key, pkey_save_t prev); Slight variants are possible. On new hardware, pkey_set() and pkey_restore() use the WRPKRU replacement. pkey_save_and_set() uses the RDPKRU replacement followed by the WRPKRU replacement. The usage is: pkey_save_t prev = pkey_save_and_set(my_key, 0); /* enable read and write */ do something with memory; pkey_restore(my_key, prev); or just: pkey_set(my_key, 0); ... pkey_set(my_key, 3); /* or 1 or 2 depending on the application */ And we make it very, very clear to user code that using RDPKRU and WRPKRU directly is a big no-no. What do you all think? This gives us sane, if slow, behavior for current CPUs and is decently fast on hypothetical new CPUs.