On Sun, Sep 20 2020 at 09:57, Linus Torvalds wrote: > On Sun, Sep 20, 2020 at 1:49 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > Btw, looking at the stack code, Ithink your new implementation of it > is a bit scary: > > static inline int kmap_atomic_idx_push(void) > { > - int idx = __this_cpu_inc_return(__kmap_atomic_idx) - 1; > + int idx = current->kmap_ctrl.idx++; > > and now that 'current->kmap_ctrl.idx' is not atomic wrt > > (a) NMI's (this may be ok, maybe we never do kmaps in NMIs, and with > nesting I think it's fine anyway - the NMI will undo whatever it did) Right. Nesting should be a non issue, but I don't think we have kmap_atomic() in NMI context. > (b) the prev/next switch > > And that (b) part worries me. You do the kmap_switch_temporary() to > switch the entries, but you do that *separately* from actually > switching 'current' to the new value. > > So kmap_switch_temporary() looks safe, but I don't think it actually > is. Because while it first unmaps the old entries and then remaps the > new ones, an interrupt can come in, and at that point it matters what > is *CURRENT*. > > And regardless of whether 'current' is 'prev' or 'next', that > kmap_switch_temporary() loop may be doing the wrong thing, depending > on which one had the deeper stack. The interrupt will be using > whatever "current->kmap_ctrl.idx" is, but that might overwrite entries > that are in the process of being restored (if current is still 'prev', > but kmap_switch_temporary() is in the "restore @next's kmaps" pgase), > or it might stomp on entries that have been pte_clear()'ed by the > 'prev' thing. Duh yes. Never thought about that. > Alternatively, that process counter would need about a hundred lines > of commentary about exactly why it's safe. Because I don't think it > is. I think the more obvious solution is to split the whole exercise: schedule() prepare_switch() unmap() switch_to() finish_switch() map() That's safe because neither the unmap() nor the map() code changes kmap_ctrl.idx. So if there is an interrupt coming in between unmap() and switch_to() then a kmap_local() there will use the next entry. So we could even do the unmap() with interrupts enabled (preemption disabled). Same for the map() part. To explain that we need only a few lines of commentry methinks. Thanks, tglx