Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Feb 11, 2020, at 3:52 PM, Jason Gunthorpe <jgg@xxxxxxxxxxxx> wrote:
> 
> Many users of the mmu_notifier invalidate_range callbacks maintain
> locking/counters/etc on a paired basis and have long expected that
> invalidate_range_start/end() are always paired.
> 
> For instance kvm_mmu_notifier_invalidate_range_end() undoes
> kvm->mmu_notifier_count which was incremented during start().
> 
> The recent change to add non-blocking notifiers breaks this assumption
> when multiple notifiers are present in the list. When EAGAIN is returned
> from an invalidate_range_start() then no invalidate_range_ends() are
> called, even if the subscription's start had previously been called.
> 
> Unfortunately, due to the RCU list traversal we can't reliably generate a
> subset of the linked list representing the notifiers already called to
> generate an invalidate_range_end() pairing.
> 
> One case works correctly, if only one subscription requires
> invalidate_range_end() and it is the last entry in the hlist. In this
> case, when invalidate_range_start() returns -EAGAIN there will be nothing
> to unwind.
> 
> Keep the notifier hlist sorted so that notifiers that require
> invalidate_range_end() are always last, and if two are added then disable
> non-blocking invalidation for the mm.
> 
> A warning is printed for this case, if in future we determine this never
> happens then we can simply fail during registration when there are
> unsupported combinations of notifiers.

This will generate a warning when running a simple qemu-kvm on arm64,

qemu-kvm (37712) created two mmu_notifier's with invalidate_range_end(): kvm_mmu_notifier_invalidate_range_end and kvm_mmu_notifier_invalidate_range_end, non-blocking notifiers disabled

> 
> Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
> Cc: Michal Hocko <mhocko@xxxxxxxx>
> Cc: "Jérôme Glisse" <jglisse@xxxxxxxxxx>
> Cc: Christoph Hellwig <hch@xxxxxx>
> Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxxxx>
> ---
> mm/mmu_notifier.c | 53 ++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 50 insertions(+), 3 deletions(-)
> 
> v1: https://lore.kernel.org/linux-mm/20190724152858.GB28493@xxxxxxxx/
> v2: https://lore.kernel.org/linux-mm/20190807191627.GA3008@xxxxxxxx/
> * Abandon attempting to fix it by calling invalidate_range_end() during an
>  EAGAIN start
> * Just trivially ban multiple subscriptions
> v3:
> * Be more sophisticated, ban only multiple subscriptions if the result is
>  a failure. Allows multiple subscriptions without invalidate_range_end
> * Include a printk when this condition is hit (Michal)
> 
> At this point the rework Christoph requested during the first posting
> is completed and there are now only 3 drivers using
> invalidate_range_end():
> 
> drivers/misc/mic/scif/scif_dma.c:       .invalidate_range_end = scif_mmu_notifier_invalidate_range_end};
> drivers/misc/sgi-gru/grutlbpurge.c:     .invalidate_range_end   = gru_invalidate_range_end,
> virt/kvm/kvm_main.c:    .invalidate_range_end   = kvm_mmu_notifier_invalidate_range_end,
> 
> While I think it is unlikely that any of these drivers will be used in
> combination with each other, display a printk in hopes to check.
> 
> Someday I expect to just fail the registration on this condition.
> 
> I think this also addresses Michal's concern about a 'big hammer' as
> it probably won't ever trigger now.
> 
> Regards,
> Jason
> 
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index ef3973a5d34a94..f3aba7a970f576 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -37,7 +37,8 @@ struct lockdep_map __mmu_notifier_invalidate_range_start_map = {
> struct mmu_notifier_subscriptions {
> 	/* all mmu notifiers registered in this mm are queued in this list */
> 	struct hlist_head list;
> -	bool has_itree;
> +	u8 has_itree;
> +	u8 no_blocking;
> 	/* to serialize the list modifications and hlist_unhashed */
> 	spinlock_t lock;
> 	unsigned long invalidate_seq;
> @@ -475,6 +476,10 @@ static int mn_hlist_invalidate_range_start(
> 	int ret = 0;
> 	int id;
> 
> +	if (unlikely(subscriptions->no_blocking &&
> +		     !mmu_notifier_range_blockable(range)))
> +		return -EAGAIN;
> +
> 	id = srcu_read_lock(&srcu);
> 	hlist_for_each_entry_rcu(subscription, &subscriptions->list, hlist) {
> 		const struct mmu_notifier_ops *ops = subscription->ops;
> @@ -590,6 +595,48 @@ void __mmu_notifier_invalidate_range(struct mm_struct *mm,
> 	srcu_read_unlock(&srcu, id);
> }
> 
> +/*
> + * Add a hlist subscription to the list. The list is kept sorted by the
> + * existence of ops->invalidate_range_end. If there is more than one
> + * invalidate_range_end in the list then this process can no longer support
> + * non-blocking invalidation.
> + *
> + * non-blocking invalidation is problematic as a requirement to block results in
> + * the invalidation being aborted, however due to the use of RCU we have no
> + * reliable way to ensure that every sueessful invalidate_range_start() results
> + * in a call to invalidate_range_end().
> + *
> + * Thus to support blocking only the last subscription in the list can have
> + * invalidate_range_end() set.
> + */
> +static void
> +mn_hist_add_subscription(struct mmu_notifier_subscriptions *subscriptions,
> +			 struct mmu_notifier *subscription)
> +{
> +	struct mmu_notifier *last = NULL;
> +	struct mmu_notifier *itr;
> +
> +	hlist_for_each_entry(itr, &subscriptions->list, hlist)
> +		last = itr;
> +
> +	if (last && last->ops->invalidate_range_end &&
> +	    subscription->ops->invalidate_range_end) {
> +		subscriptions->no_blocking = true;
> +		pr_warn_once(
> +			"%s (%d) created two mmu_notifier's with invalidate_range_end(): %ps and %ps, non-blocking notifiers disabled\n",
> +			current->comm, current->pid,
> +			last->ops->invalidate_range_end,
> +			subscription->ops->invalidate_range_end);
> +	}
> +	if (!last || !last->ops->invalidate_range_end)
> +		subscriptions->no_blocking = false;
> +
> +	if (last && subscription->ops->invalidate_range_end)
> +		hlist_add_behind_rcu(&subscription->hlist, &last->hlist);
> +	else
> +		hlist_add_head_rcu(&subscription->hlist, &subscriptions->list);
> +}
> +
> /*
>  * Same as mmu_notifier_register but here the caller must hold the mmap_sem in
>  * write mode. A NULL mn signals the notifier is being registered for itree
> @@ -660,8 +707,8 @@ int __mmu_notifier_register(struct mmu_notifier *subscription,
> 		subscription->users = 1;
> 
> 		spin_lock(&mm->notifier_subscriptions->lock);
> -		hlist_add_head_rcu(&subscription->hlist,
> -				   &mm->notifier_subscriptions->list);
> +		mn_hist_add_subscription(mm->notifier_subscriptions,
> +					 subscription);
> 		spin_unlock(&mm->notifier_subscriptions->lock);
> 	} else
> 		mm->notifier_subscriptions->has_itree = true;
> -- 
> 2.25.0
> 






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux