On 01/10/2024 15:36, Joey Gouly wrote: > As POE support was recently added, update the documentation. > > Also note that kernel threads have a default protection key register value. > > Signed-off-by: Joey Gouly <joey.gouly@xxxxxxx> > Cc: Will Deacon <will@xxxxxxxxxx> > Cc: Catalin Marinas <catalin.marinas@xxxxxxx> > Cc: Jonathan Corbet <corbet@xxxxxxx> > --- > Documentation/core-api/protection-keys.rst | 38 +++++++++++++++++----- > 1 file changed, 30 insertions(+), 8 deletions(-) > > diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst > index bf28ac0401f3..28ef6269041c 100644 > --- a/Documentation/core-api/protection-keys.rst > +++ b/Documentation/core-api/protection-keys.rst > @@ -12,7 +12,11 @@ Pkeys Userspace (PKU) is a feature which can be found on: > * Intel server CPUs, Skylake and later > * Intel client CPUs, Tiger Lake (11th Gen Core) and later > * Future AMD CPUs > + * arm64 CPUs with Permission Overlay Extension (FEAT_S1POE), introduced > + in Arm v8.8 POE is optional from v8.8, but it was introduced as part of v8.9 [1]. [1] https://developer.arm.com/documentation/109697/2024_09/Feature-descriptions/The-Armv8-9-architecture-extension?lang=en#md454-the-armv89-architecture-extension__feat_FEAT_S1POE > +x86_64 > +====== > Pkeys work by dedicating 4 previously Reserved bits in each page table entry to > a "protection key", giving 16 possible keys. > > @@ -28,6 +32,21 @@ register. The feature is only available in 64-bit mode, even though there is > theoretically space in the PAE PTEs. These permissions are enforced on data > access only and have no effect on instruction fetches. > > +arm64 > +======== Nit: empty line after title, and ideally the number of = should match the length of the title. > +Pkeys use 3 bits in each page table entry, to encod3 a "protection key index", s/encod3/encode/ > +giving 8 possible keys. > + > +Protections for each key are defined with a per-CPU user-writable system > +register (POR_EL0). This is a 64-bit register, encoding read, write and execute > +overrides flags for each protection key index. I think sticking to the "overlay" terminology is preferable - "overrides" may suggest that permissions are replaced (i.e. potentially increased). Kevin > + > +Being a CPU register, POR_EL0 is inherently thread-local, potentially giving > +each thread a different set of protections from every other thread. > + > +Unlike x86_64, the protection key permissions also apply to instruction > +fetches. > + > Syscalls > ======== > > @@ -38,11 +57,10 @@ There are 3 system calls which directly interact with pkeys:: > int pkey_mprotect(unsigned long start, size_t len, > unsigned long prot, int pkey); > > -Before a pkey can be used, it must first be allocated with > -pkey_alloc(). An application calls the WRPKRU instruction > -directly in order to change access permissions to memory covered > -with a key. In this example WRPKRU is wrapped by a C function > -called pkey_set(). > +Before a pkey can be used, it must first be allocated with pkey_alloc(). An > +application writes to the architecture specific CPU register directly in order > +to change access permissions to memory covered with a key. In this example > +this is wrapped by a C function called pkey_set(). > :: > > int real_prot = PROT_READ|PROT_WRITE; > @@ -64,9 +82,9 @@ is no longer in use:: > munmap(ptr, PAGE_SIZE); > pkey_free(pkey); > > -.. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions. > - An example implementation can be found in > - tools/testing/selftests/x86/protection_keys.c. > +.. note:: pkey_set() is a wrapper around writing to the CPU register. > + Example implementations can be found in > + tools/testing/selftests/mm/pkey-{arm64,powerpc,x86}.h > > Behavior > ======== > @@ -96,3 +114,7 @@ with a read():: > The kernel will send a SIGSEGV in both cases, but si_code will be set > to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when > the plain mprotect() permissions are violated. > + > +Note that kernel accesses from a kthread (such as io_uring), will use a default > +value for the protection key register, so will not be consistent with > +userspace's value of the register or mprotect.