Re: [PATCH RFC V2 17/17] x86/entry: Preserve PKRS MSR across exceptions

Ira Weiny <ira.weiny@xxxxxxxxx> · Fri, 24 Jul 2020 12:43:56 -0700

On Fri, Jul 24, 2020 at 10:29:23AM -0700, Andy Lutomirski wrote:
> 
> > On Jul 24, 2020, at 10:23 AM, Ira Weiny <ira.weiny@xxxxxxxxx> wrote:
> > 
> > On Thu, Jul 23, 2020 at 10:15:17PM +0200, Thomas Gleixner wrote:
> >> Thomas Gleixner <tglx@xxxxxxxxxxxxx> writes:
> >> 
> >>> Ira Weiny <ira.weiny@xxxxxxxxx> writes:
> >>>> On Fri, Jul 17, 2020 at 12:06:10PM +0200, Peter Zijlstra wrote:
> >>>>>> On Fri, Jul 17, 2020 at 12:20:56AM -0700, ira.weiny@xxxxxxxxx wrote:
> >>>>> I've been really digging into this today and I'm very concerned that I'm
> >>>>> completely missing something WRT idtentry_enter() and idtentry_exit().
> >>>>> 
> >>>>> I've instrumented idt_{save,restore}_pkrs(), and __dev_access_{en,dis}able()
> >>>>> with trace_printk()'s.
> >>>>> 
> >>>>> With this debug code, I have found an instance where it seems like
> >>>>> idtentry_enter() is called without a corresponding idtentry_exit().  This has
> >>>>> left the thread ref counter at 0 which results in very bad things happening
> >>>>> when __dev_access_disable() is called and the ref count goes negative.
> >>>>> 
> >>>>> Effectively this seems to be happening:
> >>>>> 
> >>>>> ...
> >>>>>    // ref == 0
> >>>>>    dev_access_enable()  // ref += 1 ==> disable protection
> >>>>>        // exception  (which one I don't know)
> >>>>>            idtentry_enter()
> >>>>>                // ref = 0
> >>>>>                _handler() // or whatever code...
> >>>>>            // *_exit() not called [at least there is no trace_printk() output]...
> >>>>>            // Regardless of trace output, the ref is left at 0
> >>>>>    dev_access_disable() // ref -= 1 ==> -1 ==> does not enable protection
> >>>>>    (Bad stuff is bound to happen now...)
> >>> 
> >>> Well, if any exception which calls idtentry_enter() would return without
> >>> going through idtentry_exit() then lots of bad stuff would happen even
> >>> without your patches.
> >>> 
> >>>> Also is there any chance that the process could be getting scheduled and that
> >>>> is causing an issue?
> >>> 
> >>> Only from #PF, but after the fault has been resolved and the tasks is
> >>> scheduled in again then the task returns through idtentry_exit() to the
> >>> place where it took the fault. That's not guaranteed to be on the same
> >>> CPU. If schedule is not aware of the fact that the exception turned off
> >>> stuff then you surely get into trouble. So you really want to store it
> >>> in the task itself then the context switch code can actually see the
> >>> state and act accordingly.
> >> 
> >> Actually thats nasty as well as you need a stack of PKRS values to
> >> handle nested exceptions. But it might be still the most reasonable
> >> thing to do. 7 PKRS values plus an index should be really sufficient,
> >> that's 32bytes total, not that bad.
> > 
> > I've thought about this a bit more and unless I'm wrong I think the
> > idtentry_state provides for that because each nested exception has it's own
> > idtentry_state doesn't it?
> 
> Only the ones that use idtentry_enter() instead of, say, nmi_enter().

Oh agreed...

But with this patch we are still better off than just preserving during context
switch.

I need to update the commit message here to make this clear though.

Thanks,
Ira