Re: x86/pkeys in early kernel version

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 27, 2023 at 1:12 PM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote:
>
> On Fri, Jan 27, 2023 at 11:30 AM Kyle Huey <me@xxxxxxxxxxxx> wrote:
> >
> > On Fri, Jan 27, 2023 at 11:22 AM Kyle Huey <me@xxxxxxxxxxxx> wrote:
> > >
> > > On Fri, Jan 27, 2023 at 11:08 AM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote:
> > > >
> > > > On Thu, Jan 26, 2023 at 9:55 PM Kyle Huey <me@xxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Thu, Jan 26, 2023 at 9:36 PM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Wed, Jan 25, 2023 at 5:43 PM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Wed, Jan 25, 2023 at 11:13 AM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > On 1/25/23 11:02, Jeff Xu wrote:
> > > > > > > > > I'm investigating if there is a need to backport x86/pkeys
> > > > > > > > > fix/feature into earlier kernel versions, Chrome is starting to use
> > > > > > > > > PKEY in x86, and I hope experts here can give advice on this.
> > > > > > > > >
> > > > > > > > > For background, ChromeOS regularly syncs with upstream kernel
> > > > > > > > > versions, and has production that uses 4.4/4.14/4.19/5.4/5.10/5.15.
> > > > > > > > To be honest, I haven't got the foggiest idea what you need to backport.
> > > > > > > >  I can barely keep track of mainline.
> > > > > > > >
> > > > > > > > Are there really production 4.4 kernels that you need to run on
> > > > > > > > pkey-capable hardware?  That would mean running a 2015-era kernel on a
> > > > > > > > CPU released in late 2020.  I think Q3'2020 is when the 11th gen CPUs
> > > > > > > > came out which were the first non-server CPUs that had pkeys.
> > > > > > > >
> > > > > > > Thanks!
> > > > > > > For 11th gen CPUs, chromebook uses 5.4 and above, so that eliminate
> > > > > > > half of the versions.
> > > > > > >
> > > > > > > > On a positive note, the pkeys selftest has been pretty consistently
> > > > > > > > updated as we find bugs.  I'd be curious how well a mainline version of
> > > > > > > > that selftests runs on old kernels.  But, I'm too scared to find out
> > > > > > > > what's down that particular rabbit hole myself.
> > > > > > >
> > > > > > I took the latest selftest from main and run on 5.15 kernel,
> > > > > > all passed except test_ptrace_modifies_pkru
> > > > > > assert() at protection_keys.c::1623 test_nr: 20 iteration: 1
> > > > > >
> > > > > > Is there a bugfix for the ptrace area ?
> > > > > > Thanks
> > > > >
> > > > >
> > > > > What 5.15 series kernel did you run it on? The patches for that didn't
> > > > > get backported until 5.15.88
> > > > >
> > > > Thanks! I'm using 5.15.87.
> > > > Will this patch set be backported to 5.4 and 5.10 ?
> > > > The selftest (from main) also failed on 5.4, in the same test,
> > > > but at different line:
> > > > assert() at protection_keys.c::1651 test_nr: 20 iteration: 1
> > >
> > > The regression that patch set was intended to fix was introduced in
> > > 5.14. I don't know why the test is failing on 5.4 but I have no plans
> > > to investigate it.
> >
> > Just looking at the line the test is failing on though I would suspect
> > that when PKRU was being managed by XSAVE (pre-5.14) that the PKRU
> > register didn't get updated for clearing XSTATE_BV until the XRSTOR
> > was actually executed (upon return to userspace). So multiple ptrace
> > calls in succession without userspace code execution would see a stale
> > PKRU value if the PKRU register was "changed" by clearing the relevant
> > XSTATE_BV flag. This is an extreme edge case, so I doubt you actually
> > care about the behavior.
> >

I have another case of test_ptrace_modifier_pkru failure.
This is happening in AMD 5000 CPU and 5.15.98 kernel.
The odd thing about this is:
if I run the whole set of protection_keys (20 cases), it will pass.
If I run the last case (by comment out the others), it will fail with
below error:

has pkeys: 1
startup pkey_reg: 0000000055555550
assert() at protection_keys.c::1623 test_nr: 0 iteration: 1
running abort_hooks()...
errno at assert: 0

And the same test on Intel CPU is passing.

I wonder if this is known or someone has a repro ?

Another question regarding PKRU, may or maynot related to this failure on AMD:
During the thread context switch, will PKRU be saved to the thread's
user space stack? Is this what XSAVE does (pre-5.14), and if we are
not using XSAVE after 5.15, what is used ?

Thanks
-Jeff



> Thank you for the details!
> -Jeff
>
> > - Kyle
> >
> > > - Kyle
> > >
> > > > - Jeff
> > > >
> > > > > - Kyle
> > > > >
> > > > > >
> > > > > >
> > > > > > > I can start with 5.10 or 5.15, it seems there are quite some changes though,
> > > > > > > for example,  this one by Thomas
> > > > > > > https://lore.kernel.org/lkml/20210623120127.327154589@xxxxxxxxxxxxx/
> > > > > > >
> > > > > > > My question is, if I have to pick a version that doesn't require a lot
> > > > > > > of backporting,
> > > > > > > and functionality is stable enough, what version would this be ? 5.4/5.10/5.15 ?
> > > > > > >
> > > > > > > -Jeff





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux