Re: [PATCH 02/36] mmu_notifier: keep track of active invalidation ranges v3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 21 May 2015, j.glisse@xxxxxxxxx wrote:

> From: Jérôme Glisse <jglisse@xxxxxxxxxx>
> 
> The mmu_notifier_invalidate_range_start() and mmu_notifier_invalidate_range_end()
> can be considered as forming an "atomic" section for the cpu page table update
> point of view. Between this two function the cpu page table content is unreliable
> for the address range being invalidated.
> 
> Current user such as kvm need to know when they can trust the content of the cpu
> page table. This becomes even more important to new users of the mmu_notifier
> api (such as HMM or ODP).
> 
> This patch use a structure define at all call site to invalidate_range_start()
> that is added to a list for the duration of the invalidation. It adds two new
> helpers to allow querying if a range is being invalidated or to wait for a range
> to become valid.
> 
> For proper synchronization, user must block new range invalidation from inside
> there invalidate_range_start() callback, before calling the helper functions.
> Otherwise there is no garanty that a new range invalidation will not be added
> after the call to the helper function to query for existing range.

Hi Jerome,

Most of this information will make nice block comments for the new helper 
routines. I can help tighten up the writing slightly, but first:

Question: in hmm.c's hmm_notifier_invalidate function (looking at the 
entire patchset, for a moment), I don't see any blocking of new range 
invalidations, even though you point out, above, that this is required. Am 
I missing it, and if so, where should I be looking instead?

> 
> Changed since v1:
>   - Fix a possible deadlock in mmu_notifier_range_wait_valid()
> 
> Changed since v2:
>   - Add the range to invalid range list before calling ->range_start().
>   - Del the range from invalid range list after calling ->range_end().
>   - Remove useless list initialization.
> 
> Signed-off-by: Jérôme Glisse <jglisse@xxxxxxxxxx>
> Reviewed-by: Rik van Riel <riel@xxxxxxxxxx>
> Reviewed-by: Haggai Eran <haggaie@xxxxxxxxxxxx>
> ---
>  drivers/gpu/drm/i915/i915_gem_userptr.c |  9 ++--
>  drivers/gpu/drm/radeon/radeon_mn.c      | 14 +++---
>  drivers/infiniband/core/umem_odp.c      | 16 +++----
>  drivers/misc/sgi-gru/grutlbpurge.c      | 15 +++----
>  drivers/xen/gntdev.c                    | 15 ++++---
>  fs/proc/task_mmu.c                      | 11 +++--
>  include/linux/mmu_notifier.h            | 55 ++++++++++++-----------
>  kernel/events/uprobes.c                 | 13 +++---
>  mm/huge_memory.c                        | 78 ++++++++++++++------------------
>  mm/hugetlb.c                            | 55 ++++++++++++-----------
>  mm/ksm.c                                | 28 +++++-------
>  mm/madvise.c                            | 20 ++++-----
>  mm/memory.c                             | 72 +++++++++++++++++-------------
>  mm/migrate.c                            | 36 +++++++--------
>  mm/mmu_notifier.c                       | 79 ++++++++++++++++++++++++++++-----
>  mm/mprotect.c                           | 18 ++++----
>  mm/mremap.c                             | 14 +++---
>  virt/kvm/kvm_main.c                     | 10 ++---
>  18 files changed, 302 insertions(+), 256 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index 452e9b1..80fe72a 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -131,16 +131,15 @@ restart:
>  
>  static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
>  						       struct mm_struct *mm,
> -						       unsigned long start,
> -						       unsigned long end,
> -						       enum mmu_event event)
> +						       const struct mmu_notifier_range *range)
>  {
>  	struct i915_mmu_notifier *mn = container_of(_mn, struct i915_mmu_notifier, mn);
>  	struct interval_tree_node *it = NULL;
> -	unsigned long next = start;
> +	unsigned long next = range->start;
>  	unsigned long serial = 0;
> +	/* interval ranges are inclusive, but invalidate range is exclusive */
> +	unsigned long end = range->end - 1, start = range->start;


A *very* minor point, but doing it that way messes up the scope of the 
comment. Something more like this might be cleaner:

unsigned long start = range->start;
unsigned long next = start;
unsigned long serial = 0;
/* interval ranges are inclusive, but invalidate range is exclusive */
unsigned long end = range->end - 1;


[...]

> -					   enum mmu_event event)
> +					   struct mmu_notifier_range *range)
>  
>  {
>  	struct mmu_notifier *mn;
>  	int id;
>  
> +	spin_lock(&mm->mmu_notifier_mm->lock);
> +	list_add_tail(&range->list, &mm->mmu_notifier_mm->ranges);
> +	mm->mmu_notifier_mm->nranges++;


Is this missing a call to wake_up(&mm->mmu_notifier_mm->wait_queue)? If 
not, then it would be helpful to explain why that's only required for 
nranges--, and not for the nranges++ case. The helper routine is merely 
waiting for nranges to *change*, not looking for greater than or less 
than.


> +	spin_unlock(&mm->mmu_notifier_mm->lock);
> +
>  	id = srcu_read_lock(&srcu);
>  	hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
>  		if (mn->ops->invalidate_range_start)
> -			mn->ops->invalidate_range_start(mn, mm, start,
> -							end, event);
> +			mn->ops->invalidate_range_start(mn, mm, range);
>  	}
>  	srcu_read_unlock(&srcu, id);
>  }
>  EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_start);
>  
>  void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
> -					 unsigned long start,
> -					 unsigned long end,
> -					 enum mmu_event event)
> +					 struct mmu_notifier_range *range)
>  {
>  	struct mmu_notifier *mn;
>  	int id;
> @@ -211,12 +211,23 @@ void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
>  		 * (besides the pointer check).
>  		 */
>  		if (mn->ops->invalidate_range)
> -			mn->ops->invalidate_range(mn, mm, start, end);
> +			mn->ops->invalidate_range(mn, mm,
> +						  range->start, range->end);
>  		if (mn->ops->invalidate_range_end)
> -			mn->ops->invalidate_range_end(mn, mm, start,
> -						      end, event);
> +			mn->ops->invalidate_range_end(mn, mm, range);
>  	}
>  	srcu_read_unlock(&srcu, id);
> +
> +	spin_lock(&mm->mmu_notifier_mm->lock);
> +	list_del_init(&range->list);
> +	mm->mmu_notifier_mm->nranges--;
> +	spin_unlock(&mm->mmu_notifier_mm->lock);
> +
> +	/*
> +	 * Wakeup after callback so they can do their job before any of the
> +	 * waiters resume.
> +	 */
> +	wake_up(&mm->mmu_notifier_mm->wait_queue);
>  }
>  EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range_end);
>  
> @@ -235,6 +246,49 @@ void __mmu_notifier_invalidate_range(struct mm_struct *mm,
>  }
>  EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range);
>  


We definitely want to put a little documentation here.


> +static bool mmu_notifier_range_is_valid_locked(struct mm_struct *mm,
> +					       unsigned long start,
> +					       unsigned long end)


This routine is named "_range_is_valid_", but it takes in an implicit 
range (start, end), and also a list of ranges (buried in mm), and so it's 
a little confusing. I'd like to consider *maybe* changing either the name, 
or the args (range* instead of start, end?), or something.

Could you please say a few words about the intent of this routine, to get 
us started there?


> +{
> +	struct mmu_notifier_range *range;
> +
> +	list_for_each_entry(range, &mm->mmu_notifier_mm->ranges, list) {
> +		if (!(range->end <= start || range->start >= end))
> +			return false;


This has a lot of negatives in it, if you count the innermost "not in 
range" expression. It can be simplified to this:

if(range->end > start && range->start < end)
	return false;


> +	}
> +	return true;
> +}
> +
> +bool mmu_notifier_range_is_valid(struct mm_struct *mm,
> +				 unsigned long start,
> +				 unsigned long end)
> +{
> +	bool valid;
> +
> +	spin_lock(&mm->mmu_notifier_mm->lock);
> +	valid = mmu_notifier_range_is_valid_locked(mm, start, end);
> +	spin_unlock(&mm->mmu_notifier_mm->lock);
> +	return valid;
> +}
> +EXPORT_SYMBOL_GPL(mmu_notifier_range_is_valid);
> +
> +void mmu_notifier_range_wait_valid(struct mm_struct *mm,
> +				   unsigned long start,
> +				   unsigned long end)
> +{
> +	spin_lock(&mm->mmu_notifier_mm->lock);
> +	while (!mmu_notifier_range_is_valid_locked(mm, start, end)) {
> +		int nranges = mm->mmu_notifier_mm->nranges;
> +
> +		spin_unlock(&mm->mmu_notifier_mm->lock);
> +		wait_event(mm->mmu_notifier_mm->wait_queue,
> +			   nranges != mm->mmu_notifier_mm->nranges);
> +		spin_lock(&mm->mmu_notifier_mm->lock);
> +	}
> +	spin_unlock(&mm->mmu_notifier_mm->lock);
> +}
> +EXPORT_SYMBOL_GPL(mmu_notifier_range_wait_valid);
> +
>  static int do_mmu_notifier_register(struct mmu_notifier *mn,
>  				    struct mm_struct *mm,
>  				    int take_mmap_sem)
> @@ -264,6 +318,9 @@ static int do_mmu_notifier_register(struct mmu_notifier *mn,
>  	if (!mm_has_notifiers(mm)) {

[...]

That's all I could see to mention for this one, thanks,

john h

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]