Re: [PATCH v2 12/11] mm/hmm: Fix error flows in hmm_invalidate_range_start

Jason Gunthorpe <jgg@xxxxxxxx> · Fri, 7 Jun 2019 22:35:30 -0300

On Fri, Jun 07, 2019 at 04:52:58PM -0700, Ralph Campbell wrote:
> > @@ -141,6 +142,23 @@ static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
> >   	hmm_put(hmm);
> >   }
> > +static void notifiers_decrement(struct hmm *hmm)
> > +{
> > +	lockdep_assert_held(&hmm->ranges_lock);
> > +
> > +	hmm->notifiers--;
> > +	if (!hmm->notifiers) {
> > +		struct hmm_range *range;
> > +
> > +		list_for_each_entry(range, &hmm->ranges, list) {
> > +			if (range->valid)
> > +				continue;
> > +			range->valid = true;
> > +		}
> 
> This just effectively sets all ranges to valid.
> I'm not sure that is best.

This is a trade off, it would be much more expensive to have a precise
'valid = true' - instead this algorithm is precise about 'valid =
false' and lazy about 'valid = true' which is much less costly to
calculate.

> Shouldn't hmm_range_register() start with range.valid = true and
> then hmm_invalidate_range_start() set affected ranges to false?

It kind of does, expect when it doesn't, right? :)

> Then this becomes just wake_up_all() if --notifiers == 0 and
> hmm_range_wait_until_valid() should wait for notifiers == 0.

Almost.. but it is more tricky than that.

This scheme is a collision-retry algorithm. The pagefault side runs to
completion if no parallel invalidate start/end happens.

If a parallel invalidation happens then the pagefault retries.

Seeing notifiers == 0 means there is absolutely no current parallel
invalidation.

Seeing range->valid == true (under the device lock)
means this range doesn't intersect with a parallel invalidate.

So.. hmm_range_wait_until_valid() checks the per-range valid because
it doesn't want to sleep if *this range* is not involved in a parallel
invalidation - but once it becomes involved, then yes, valid == true
implies notifiers == 0.

It is easier/safer to use unlocked variable reads if there is only one
variable, thus the weird construction.

It is unclear to me if this micro optimization is really
worthwhile. It is very expensive to manage all this tracking, and no
other mmu notifier implementation really does something like
this. Eliminating the per-range tracking and using the notifier count
as a global lock would be much simpler...

> Otherwise, range.valid doesn't really mean it's valid.

Right, it doesn't really mean 'valid'

It is tracking possible colliding invalidates such that valid == true
(under the device lock) means that there was no colliding invalidate.

I still think this implementation doesn't quite work, as I described
here:

https://lore.kernel.org/linux-mm/20190527195829.GB18019@xxxxxxxxxxxx/

But the idea is basically sound and matches what other mmu notifier
users do, just using a seqcount like scheme, not a boolean.

Jason