On Fri, Jan 15, 2021 at 04:27:54PM +0100, Borislav Petkov wrote: > On Thu, Jan 14, 2021 at 04:38:17PM -0800, Tony Luck wrote: > > Add a "mce_busy" counter so that task_work_add() is only called once > > per faulty page in this task. > > Yeah, that sentence can be removed now too. I will update with new name "mce_count" and some details. > > -static void queue_task_work(struct mce *m, int kill_current_task) > > +static void queue_task_work(struct mce *m, char *msg, int kill_current_task) > > So this function gets called in the user mode MCE case too: > > if ((m.cs & 3) == 3) { > > queue_task_work(&m, msg, kill_current_task); > } > > Do we want to panic for multiple MCEs to different addresses in user > mode? In the user mode case we should only bump mce_count to "1" and before task_work() gets called. It shouldn't hurt to do the same checks. Maybe it will catch something weird - like an NMI handler on return from the machine check doing a get_user() that hits another machine check during the return from this machine check. AndyL has made me extra paranoid. :-) > > - current->mce_addr = m->addr; > > - current->mce_kflags = m->kflags; > > - current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV); > > - current->mce_whole_page = whole_page(m); > > + if (current->mce_count++ == 0) { > > + current->mce_addr = m->addr; > > + current->mce_kflags = m->kflags; > > + current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV); > > + current->mce_whole_page = whole_page(m); > > + } > > + > > /* Magic number should be large enough */ > > > + if (current->mce_count > 10) Will add similar comment here ... and to other tests in this function since it may not be obvious to me next year what I was thinking now :-) > > + if (current->mce_count > 10) > > + mce_panic("Too many machine checks while accessing user data", m, msg); > > + > > + if (current->mce_count > 1 || (current->mce_addr >> PAGE_SHIFT) != (m->addr >> PAGE_SHIFT)) > > + mce_panic("Machine checks to different user pages", m, msg); > > Will this second part of the test expression, after the "||" ever hit? No :-( This code is wrong. Should be "&&" not "||". Then it makes more sense. Will fix for v4. > In any case, what are you trying to catch with this? Two get_user() to > different pages both catching MCEs? Yes. Trying to catch two accesses to different pages. Need to do this because kill_me_maybe() is only going to offline one page. I'm not expecting that this would ever hit. It means that calling code took a machine check on one page and get_user() said -EFAULT. The the code decided to access a different page *and* that other page also triggered a machine check. > > + /* Do not call task_work_add() more than once */ > > + if (current->mce_count > 1) > > + return; > > That won't happen either, AFAICT. It'll panic above. With the s/||/&&/ above, we can get here. > > Regardless, I like how this is all confined to the MCE code and there's > no need to touch stuff outside... Thanks for the review. -Tony