On Fri, Jan 27, 2023 at 1:12 PM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote: > > On Fri, Jan 27, 2023 at 11:30 AM Kyle Huey <me@xxxxxxxxxxxx> wrote: > > > > On Fri, Jan 27, 2023 at 11:22 AM Kyle Huey <me@xxxxxxxxxxxx> wrote: > > > > > > On Fri, Jan 27, 2023 at 11:08 AM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote: > > > > > > > > On Thu, Jan 26, 2023 at 9:55 PM Kyle Huey <me@xxxxxxxxxxxx> wrote: > > > > > > > > > > On Thu, Jan 26, 2023 at 9:36 PM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote: > > > > > > > > > > > > On Wed, Jan 25, 2023 at 5:43 PM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote: > > > > > > > > > > > > > > On Wed, Jan 25, 2023 at 11:13 AM Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > > > > > > > > > > > > > > > > On 1/25/23 11:02, Jeff Xu wrote: > > > > > > > > > I'm investigating if there is a need to backport x86/pkeys > > > > > > > > > fix/feature into earlier kernel versions, Chrome is starting to use > > > > > > > > > PKEY in x86, and I hope experts here can give advice on this. > > > > > > > > > > > > > > > > > > For background, ChromeOS regularly syncs with upstream kernel > > > > > > > > > versions, and has production that uses 4.4/4.14/4.19/5.4/5.10/5.15. > > > > > > > > To be honest, I haven't got the foggiest idea what you need to backport. > > > > > > > > I can barely keep track of mainline. > > > > > > > > > > > > > > > > Are there really production 4.4 kernels that you need to run on > > > > > > > > pkey-capable hardware? That would mean running a 2015-era kernel on a > > > > > > > > CPU released in late 2020. I think Q3'2020 is when the 11th gen CPUs > > > > > > > > came out which were the first non-server CPUs that had pkeys. > > > > > > > > > > > > > > > Thanks! > > > > > > > For 11th gen CPUs, chromebook uses 5.4 and above, so that eliminate > > > > > > > half of the versions. > > > > > > > > > > > > > > > On a positive note, the pkeys selftest has been pretty consistently > > > > > > > > updated as we find bugs. I'd be curious how well a mainline version of > > > > > > > > that selftests runs on old kernels. But, I'm too scared to find out > > > > > > > > what's down that particular rabbit hole myself. > > > > > > > > > > > > > I took the latest selftest from main and run on 5.15 kernel, > > > > > > all passed except test_ptrace_modifies_pkru > > > > > > assert() at protection_keys.c::1623 test_nr: 20 iteration: 1 > > > > > > > > > > > > Is there a bugfix for the ptrace area ? > > > > > > Thanks > > > > > > > > > > > > > > > What 5.15 series kernel did you run it on? The patches for that didn't > > > > > get backported until 5.15.88 > > > > > > > > > Thanks! I'm using 5.15.87. > > > > Will this patch set be backported to 5.4 and 5.10 ? > > > > The selftest (from main) also failed on 5.4, in the same test, > > > > but at different line: > > > > assert() at protection_keys.c::1651 test_nr: 20 iteration: 1 > > > > > > The regression that patch set was intended to fix was introduced in > > > 5.14. I don't know why the test is failing on 5.4 but I have no plans > > > to investigate it. > > > > Just looking at the line the test is failing on though I would suspect > > that when PKRU was being managed by XSAVE (pre-5.14) that the PKRU > > register didn't get updated for clearing XSTATE_BV until the XRSTOR > > was actually executed (upon return to userspace). So multiple ptrace > > calls in succession without userspace code execution would see a stale > > PKRU value if the PKRU register was "changed" by clearing the relevant > > XSTATE_BV flag. This is an extreme edge case, so I doubt you actually > > care about the behavior. > > I have another case of test_ptrace_modifier_pkru failure. This is happening in AMD 5000 CPU and 5.15.98 kernel. The odd thing about this is: if I run the whole set of protection_keys (20 cases), it will pass. If I run the last case (by comment out the others), it will fail with below error: has pkeys: 1 startup pkey_reg: 0000000055555550 assert() at protection_keys.c::1623 test_nr: 0 iteration: 1 running abort_hooks()... errno at assert: 0 And the same test on Intel CPU is passing. I wonder if this is known or someone has a repro ? Another question regarding PKRU, may or maynot related to this failure on AMD: During the thread context switch, will PKRU be saved to the thread's user space stack? Is this what XSAVE does (pre-5.14), and if we are not using XSAVE after 5.15, what is used ? Thanks -Jeff > Thank you for the details! > -Jeff > > > - Kyle > > > > > - Kyle > > > > > > > - Jeff > > > > > > > > > - Kyle > > > > > > > > > > > > > > > > > > > > > > > > I can start with 5.10 or 5.15, it seems there are quite some changes though, > > > > > > > for example, this one by Thomas > > > > > > > https://lore.kernel.org/lkml/20210623120127.327154589@xxxxxxxxxxxxx/ > > > > > > > > > > > > > > My question is, if I have to pick a version that doesn't require a lot > > > > > > > of backporting, > > > > > > > and functionality is stable enough, what version would this be ? 5.4/5.10/5.15 ? > > > > > > > > > > > > > > -Jeff