Re: [PATCH 1/5] mm: Check if mmu notifier callbacks are allowed to fail

Jason Gunthorpe <jgg@xxxxxxxx> · Wed, 14 Aug 2019 20:22:38 -0300

On Wed, Aug 14, 2019 at 03:14:47PM -0700, Andrew Morton wrote:
> On Wed, 14 Aug 2019 22:20:23 +0200 Daniel Vetter <daniel.vetter@xxxxxxxx> wrote:
> 
> > Just a bit of paranoia, since if we start pushing this deep into
> > callchains it's hard to spot all places where an mmu notifier
> > implementation might fail when it's not allowed to.
> > 
> > Inspired by some confusion we had discussing i915 mmu notifiers and
> > whether we could use the newly-introduced return value to handle some
> > corner cases. Until we realized that these are only for when a task
> > has been killed by the oom reaper.
> > 
> > An alternative approach would be to split the callback into two
> > versions, one with the int return value, and the other with void
> > return value like in older kernels. But that's a lot more churn for
> > fairly little gain I think.
> > 
> > Summary from the m-l discussion on why we want something at warning
> > level: This allows automated tooling in CI to catch bugs without
> > humans having to look at everything. If we just upgrade the existing
> > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > one will ever spot the problem since it's lost in the massive amounts
> > of overall dmesg noise.
> > 
> > ...
> >
> > +++ b/mm/mmu_notifier.c
> > @@ -179,6 +179,8 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
> >  				pr_info("%pS callback failed with %d in %sblockable context.\n",
> >  					mn->ops->invalidate_range_start, _ret,
> >  					!mmu_notifier_range_blockable(range) ? "non-" : "");
> > +				WARN_ON(mmu_notifier_range_blockable(range) ||
> > +					ret != -EAGAIN);
> >  				ret = _ret;
> >  			}
> >  		}
> 
> A problem with WARN_ON(a || b) is that if it triggers, we don't know
> whether it was because of a or because of b.  Or both.  So I'd suggest
> 
> 	WARN_ON(a);
> 	WARN_ON(b);
> 

Well, we did just make a pr_info right above with the value of
blockable, that seems enough to tell the cases apart?

But you are generally right, the full logic:

    if (_ret) {
      if (WARN_ON(mmu_notifier_range_blockable(range)))
            continue;
      WARN_ON(_ret != -EAGAIN);
      ret = -EAGAIN;
      break;
    }

would force correct API contract up the call chain once we detect a
broken driver..

But at some point it does feel like a bit much debugging logic to have
in a production code path, as this should never happen and is just to
discourage wrong driver behaviors during driver development.

If we like this version then:

Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxxxx>

Also - I have a bunch of other patches to mmu notifiers for hmm.git,
so when everyone agrees I can grab this to avoid conflicts.

Thanks,
Jason
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel