On 2/11/20 12:52 PM, Jason Gunthorpe wrote:
Many users of the mmu_notifier invalidate_range callbacks maintain
locking/counters/etc on a paired basis and have long expected that
invalidate_range_start/end() are always paired.
For instance kvm_mmu_notifier_invalidate_range_end() undoes
kvm->mmu_notifier_count which was incremented during start().
The recent change to add non-blocking notifiers breaks this assumption
when multiple notifiers are present in the list. When EAGAIN is returned
from an invalidate_range_start() then no invalidate_range_ends() are
called, even if the subscription's start had previously been called.
Unfortunately, due to the RCU list traversal we can't reliably generate a
subset of the linked list representing the notifiers already called to
generate an invalidate_range_end() pairing.
One case works correctly, if only one subscription requires
invalidate_range_end() and it is the last entry in the hlist. In this
case, when invalidate_range_start() returns -EAGAIN there will be nothing
to unwind.
Keep the notifier hlist sorted so that notifiers that require
invalidate_range_end() are always last, and if two are added then disable
non-blocking invalidation for the mm.
A warning is printed for this case, if in future we determine this never
happens then we can simply fail during registration when there are
unsupported combinations of notifiers.
Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
Cc: Michal Hocko <mhocko@xxxxxxxx>
Cc: "Jérôme Glisse" <jglisse@xxxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxxxx>
---
mm/mmu_notifier.c | 53 ++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 50 insertions(+), 3 deletions(-)
v1: https://lore.kernel.org/linux-mm/20190724152858.GB28493@xxxxxxxx/
v2: https://lore.kernel.org/linux-mm/20190807191627.GA3008@xxxxxxxx/
* Abandon attempting to fix it by calling invalidate_range_end() during an
EAGAIN start
* Just trivially ban multiple subscriptions
v3:
* Be more sophisticated, ban only multiple subscriptions if the result is
a failure. Allows multiple subscriptions without invalidate_range_end
* Include a printk when this condition is hit (Michal)
At this point the rework Christoph requested during the first posting
is completed and there are now only 3 drivers using
invalidate_range_end():
drivers/misc/mic/scif/scif_dma.c: .invalidate_range_end = scif_mmu_notifier_invalidate_range_end};
drivers/misc/sgi-gru/grutlbpurge.c: .invalidate_range_end = gru_invalidate_range_end,
virt/kvm/kvm_main.c: .invalidate_range_end = kvm_mmu_notifier_invalidate_range_end,
While I think it is unlikely that any of these drivers will be used in
combination with each other, display a printk in hopes to check.
Someday I expect to just fail the registration on this condition.
I think this also addresses Michal's concern about a 'big hammer' as
it probably won't ever trigger now.
Regards,
Jason
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index ef3973a5d34a94..f3aba7a970f576 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -37,7 +37,8 @@ struct lockdep_map __mmu_notifier_invalidate_range_start_map = {
struct mmu_notifier_subscriptions {
/* all mmu notifiers registered in this mm are queued in this list */
struct hlist_head list;
- bool has_itree;
+ u8 has_itree;
+ u8 no_blocking;
/* to serialize the list modifications and hlist_unhashed */
spinlock_t lock;
unsigned long invalidate_seq;
@@ -475,6 +476,10 @@ static int mn_hlist_invalidate_range_start(
int ret = 0;
int id;
+ if (unlikely(subscriptions->no_blocking &&
+ !mmu_notifier_range_blockable(range)))
+ return -EAGAIN;
+
id = srcu_read_lock(&srcu);
hlist_for_each_entry_rcu(subscription, &subscriptions->list, hlist) {
const struct mmu_notifier_ops *ops = subscription->ops;
@@ -590,6 +595,48 @@ void __mmu_notifier_invalidate_range(struct mm_struct *mm,
srcu_read_unlock(&srcu, id);
}
+/*
+ * Add a hlist subscription to the list. The list is kept sorted by the
+ * existence of ops->invalidate_range_end. If there is more than one
+ * invalidate_range_end in the list then this process can no longer support
+ * non-blocking invalidation.
+ *
+ * non-blocking invalidation is problematic as a requirement to block results in
+ * the invalidation being aborted, however due to the use of RCU we have no
+ * reliable way to ensure that every sueessful invalidate_range_start() results
s/sueessful/successful
+ * in a call to invalidate_range_end().
+ *
+ * Thus to support blocking only the last subscription in the list can have
+ * invalidate_range_end() set.
+ */
+static void
+mn_hist_add_subscription(struct mmu_notifier_subscriptions *subscriptions,
+ struct mmu_notifier *subscription)
We have mn_hlist_xxx in a number of places in mmu_notifier.c.
Seems like this should be named mn_hlist_add_subscription().
+{
+ struct mmu_notifier *last = NULL;
+ struct mmu_notifier *itr;
+
+ hlist_for_each_entry(itr, &subscriptions->list, hlist)
+ last = itr;
+
+ if (last && last->ops->invalidate_range_end &&
+ subscription->ops->invalidate_range_end) {
+ subscriptions->no_blocking = true;
+ pr_warn_once(
+ "%s (%d) created two mmu_notifier's with invalidate_range_end(): %ps and %ps, non-blocking notifiers disabled\n",
line length?
+ current->comm, current->pid,
+ last->ops->invalidate_range_end,
+ subscription->ops->invalidate_range_end);
+ }
+ if (!last || !last->ops->invalidate_range_end)
+ subscriptions->no_blocking = false;
+
+ if (last && subscription->ops->invalidate_range_end)
+ hlist_add_behind_rcu(&subscription->hlist, &last->hlist);
+ else
+ hlist_add_head_rcu(&subscription->hlist, &subscriptions->list);
+}
+
/*
* Same as mmu_notifier_register but here the caller must hold the mmap_sem in
* write mode. A NULL mn signals the notifier is being registered for itree
@@ -660,8 +707,8 @@ int __mmu_notifier_register(struct mmu_notifier *subscription,
subscription->users = 1;
spin_lock(&mm->notifier_subscriptions->lock);
- hlist_add_head_rcu(&subscription->hlist,
- &mm->notifier_subscriptions->list);
+ mn_hist_add_subscription(mm->notifier_subscriptions,
+ subscription);
spin_unlock(&mm->notifier_subscriptions->lock);
} else
mm->notifier_subscriptions->has_itree = true;
Other than some nits, looks good to me so you can add:
Reviewed-by: Ralph Campbell <rcampbell@xxxxxxxxxx>