On 3/8/19 10:08 AM, Sebastian Andrzej Siewior wrote: > On 2019-02-25 10:16:24 [-0800], Dave Hansen wrote: >>> + if (!cpu_feature_enabled(X86_FEATURE_OSPKE)) >>> + return; >>> + >>> + if (current->mm) { >>> + pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU); >>> + WARN_ON_ONCE(!pk); >> >> This can trip on us of the 'init optimization' is in play because >> get_xsave_addr() checks xsave->header.xfeatures. That's unlikely today >> because we usually set PKRU to a restrictive value. But, it's also not >> *guaranteed*. >> >> Userspace could easily do an XRSTOR that puts PKRU back in its init >> state if it wanted to, then this would end up with pk==NULL. >> >> We might actually want a selftest that *does* that. I don't think we >> have one. > > So you are saying that the above warning might trigger and be "okay"? Nothing will break, but the warning will trigger, which isn't nice. > My understanding is that the in-kernel XSAVE will always save everything > so we should never "lose" the XFEATURE_PKRU no matter what user space > does. > > So as test case you want > xsave (-1 & ~XFEATURE_PKRU) > xrestore (-1 & ~XFEATURE_PKRU) > > in userland and then a context switch to see if the warning above > triggers? I think you need an XRSTOR with RFBM=-1 (or at least with the PKRU bit set) and the PKRU bit in the XFEATURES field in the XSAVE buffer set to 0. >>> + if (pk) >>> + pkru_val = pk->pkru; >>> + }> + __write_pkru(pkru_val); >>> } >> >> A comment above __write_pkru() would be nice to say that it only >> actually does the slow instruction on changes to the value. > > Could we please not do this? It is a comment above one of the callers > function and we have two or three. And we have that comment already > within __write_pkru(). I looked at this code and thought "writing PKRU is slow", and "this writes PKRU unconditionally", and "the __ version of the function shoudn't have much logic in it". I got 2/3 wrong. To me that means this site needs a 1-line comment. Feel free to move one of the other comments to here if you think it's over-commented, but this site needs one. >> BTW, this has the implicit behavior of always trying to do a >> __write_pkru(0) on switches to kernel threads. That seems a bit weird >> and it is likely to impose WRPKRU overhead on switches between user and >> kernel threads. >> >> The 0 value is also the most permissive, which is not great considering >> that user mm's can be active the in page tables when running kernel >> threads if we're being lazy. >> >> Seems like we should either leave PKRU alone or have 'init_pkru_value' >> be the default. That gives good security properties and is likely to >> match the application value, removing the WRPKRU overhead. > > Last time we talked about this we agreed (or this was my impression) that > 0 should be written so that the kernel thread should always be able to > write to user space in case it borrowed its mm (otherwise it has none > and it would fail anyway). We can't write to userspace when borrowing an mm. If the kernel borrows an mm, we might as well be on the init_mm which has no userspace mappings. > We didn't want to leave PKRU alone because the outcome (whether or not > the write by the kernel thread succeeds) should not depend on the last > running task (and be random) but deterministic. Right, so let's make it deterministically restrictive: either init_pkru_value, or -1 since kernel threads shouldn't be touching userspace in the first place.