On Fri, Jul 24, 2020 at 10:29:23AM -0700, Andy Lutomirski wrote: > > > On Jul 24, 2020, at 10:23 AM, Ira Weiny <ira.weiny@xxxxxxxxx> wrote: > > > > On Thu, Jul 23, 2020 at 10:15:17PM +0200, Thomas Gleixner wrote: > >> Thomas Gleixner <tglx@xxxxxxxxxxxxx> writes: > >> > >>> Ira Weiny <ira.weiny@xxxxxxxxx> writes: > >>>> On Fri, Jul 17, 2020 at 12:06:10PM +0200, Peter Zijlstra wrote: > >>>>>> On Fri, Jul 17, 2020 at 12:20:56AM -0700, ira.weiny@xxxxxxxxx wrote: > >>>>> I've been really digging into this today and I'm very concerned that I'm > >>>>> completely missing something WRT idtentry_enter() and idtentry_exit(). > >>>>> > >>>>> I've instrumented idt_{save,restore}_pkrs(), and __dev_access_{en,dis}able() > >>>>> with trace_printk()'s. > >>>>> > >>>>> With this debug code, I have found an instance where it seems like > >>>>> idtentry_enter() is called without a corresponding idtentry_exit(). This has > >>>>> left the thread ref counter at 0 which results in very bad things happening > >>>>> when __dev_access_disable() is called and the ref count goes negative. > >>>>> > >>>>> Effectively this seems to be happening: > >>>>> > >>>>> ... > >>>>> // ref == 0 > >>>>> dev_access_enable() // ref += 1 ==> disable protection > >>>>> // exception (which one I don't know) > >>>>> idtentry_enter() > >>>>> // ref = 0 > >>>>> _handler() // or whatever code... > >>>>> // *_exit() not called [at least there is no trace_printk() output]... > >>>>> // Regardless of trace output, the ref is left at 0 > >>>>> dev_access_disable() // ref -= 1 ==> -1 ==> does not enable protection > >>>>> (Bad stuff is bound to happen now...) > >>> > >>> Well, if any exception which calls idtentry_enter() would return without > >>> going through idtentry_exit() then lots of bad stuff would happen even > >>> without your patches. > >>> > >>>> Also is there any chance that the process could be getting scheduled and that > >>>> is causing an issue? > >>> > >>> Only from #PF, but after the fault has been resolved and the tasks is > >>> scheduled in again then the task returns through idtentry_exit() to the > >>> place where it took the fault. That's not guaranteed to be on the same > >>> CPU. If schedule is not aware of the fact that the exception turned off > >>> stuff then you surely get into trouble. So you really want to store it > >>> in the task itself then the context switch code can actually see the > >>> state and act accordingly. > >> > >> Actually thats nasty as well as you need a stack of PKRS values to > >> handle nested exceptions. But it might be still the most reasonable > >> thing to do. 7 PKRS values plus an index should be really sufficient, > >> that's 32bytes total, not that bad. > > > > I've thought about this a bit more and unless I'm wrong I think the > > idtentry_state provides for that because each nested exception has it's own > > idtentry_state doesn't it? > > Only the ones that use idtentry_enter() instead of, say, nmi_enter(). Oh agreed... But with this patch we are still better off than just preserving during context switch. I need to update the commit message here to make this clear though. Thanks, Ira