Thomas Gleixner <tglx@xxxxxxxxxxxxx> writes: > Ira Weiny <ira.weiny@xxxxxxxxx> writes: >> On Fri, Jul 17, 2020 at 12:06:10PM +0200, Peter Zijlstra wrote: >>> On Fri, Jul 17, 2020 at 12:20:56AM -0700, ira.weiny@xxxxxxxxx wrote: >> I've been really digging into this today and I'm very concerned that I'm >> completely missing something WRT idtentry_enter() and idtentry_exit(). >> >> I've instrumented idt_{save,restore}_pkrs(), and __dev_access_{en,dis}able() >> with trace_printk()'s. >> >> With this debug code, I have found an instance where it seems like >> idtentry_enter() is called without a corresponding idtentry_exit(). This has >> left the thread ref counter at 0 which results in very bad things happening >> when __dev_access_disable() is called and the ref count goes negative. >> >> Effectively this seems to be happening: >> >> ... >> // ref == 0 >> dev_access_enable() // ref += 1 ==> disable protection >> // exception (which one I don't know) >> idtentry_enter() >> // ref = 0 >> _handler() // or whatever code... >> // *_exit() not called [at least there is no trace_printk() output]... >> // Regardless of trace output, the ref is left at 0 >> dev_access_disable() // ref -= 1 ==> -1 ==> does not enable protection >> (Bad stuff is bound to happen now...) > > Well, if any exception which calls idtentry_enter() would return without > going through idtentry_exit() then lots of bad stuff would happen even > without your patches. > >> Also is there any chance that the process could be getting scheduled and that >> is causing an issue? > > Only from #PF, but after the fault has been resolved and the tasks is > scheduled in again then the task returns through idtentry_exit() to the > place where it took the fault. That's not guaranteed to be on the same > CPU. If schedule is not aware of the fact that the exception turned off > stuff then you surely get into trouble. So you really want to store it > in the task itself then the context switch code can actually see the > state and act accordingly. Actually thats nasty as well as you need a stack of PKRS values to handle nested exceptions. But it might be still the most reasonable thing to do. 7 PKRS values plus an index should be really sufficient, that's 32bytes total, not that bad. Thanks, tglx