Re: [PATCH V1 1/2] mm: Move all mmu notifier invocations to be done outside the PT lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue,  4 Sep 2012 11:41:20 +0300
Haggai Eran <haggaie@xxxxxxxxxxxx> wrote:

> From: Sagi Grimberg <sagig@xxxxxxxxxxxx>
> 
> In order to allow sleeping during mmu notifier calls, we need to avoid
> invoking them under the page table spinlock. This patch solves the
> problem by calling invalidate_page notification after releasing
> the lock (but before freeing the page itself), or by wrapping the page
> invalidation with calls to invalidate_range_begin and
> invalidate_range_end.
> 
>
> ...
>
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -868,12 +868,14 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
>  		cond_resched();
>  	}
>  
> +	mmu_notifier_invalidate_range_start(mm, haddr, haddr + HPAGE_PMD_SIZE);
> +
>  	spin_lock(&mm->page_table_lock);
>  	if (unlikely(!pmd_same(*pmd, orig_pmd)))
>  		goto out_free_pages;
>  	VM_BUG_ON(!PageHead(page));
>  
> -	pmdp_clear_flush_notify(vma, haddr, pmd);
> +	pmdp_clear_flush(vma, haddr, pmd);
>  	/* leave pmd empty until pte is filled */
>  
>  	pgtable = get_pmd_huge_pte(mm);
> @@ -896,6 +898,9 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
>  	page_remove_rmap(page);
>  	spin_unlock(&mm->page_table_lock);
>  
> +	mmu_notifier_invalidate_range_end(vma->vm_mm, haddr,
> +					  haddr + HPAGE_PMD_SIZE);

But `haddr' has been altered by the time we get here.  We should have
saved the original value?

That's a thing I don't like about this patchset - it adds maintenance
overhead.  This need to retain values of local variables or incoming
args across lengthy code sequences is fragile.  We could easily break
your changes as the core code evolves, and it would take a long long
time before anyone noticed the breakage.

I'm wondering if it would be better to adopt the convention throughout
this patchset that mmu_notifier_invalidate_range_start() and
mmu_notifier_invalidate_range_end() always use their own locals.  ie:

	unsigned long mmun_start;	/* For mmu_notifiers */
	unsigned long mmun_end;		/* For mmu_notifiers */

	...

	mmun_start = ...;
	mmun_end = ...;

	...

	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);

	...

	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);


This is verbose, but it is explicit and clear and more robust than what
you have.  It shouldn't generate any additional text or stack usage or
register usage unless gcc is having an especially stupid day.


>  	ret |= VM_FAULT_WRITE;
>  	put_page(page);
>  
>
> ...
>
> @@ -1382,7 +1391,7 @@ static int try_to_unmap_cluster(unsigned long cursor, unsigned int *mapcount,
>  	spinlock_t *ptl;
>  	struct page *page;
>  	unsigned long address;
> -	unsigned long end;
> +	unsigned long start, end;

You'll note that this function uses the one-definition-per-line
convention, which has a few (smallish) advantages over
multiple-definitions-per-line.  One such advantage is that it leaves
room for a nice little comment at the RHS.  Take that as a hint ;)

>  	int ret = SWAP_AGAIN;
>  	int locked_vma = 0;
>  
> @@ -1405,6 +1414,9 @@ static int try_to_unmap_cluster(unsigned long cursor, unsigned int *mapcount,
>  	if (!pmd_present(*pmd))
>  		return ret;
>  
> +	start = address;
> +	mmu_notifier_invalidate_range_start(mm, start, end);

`end' is used uninitialised in this function.

I'm surprised that it didn't generate a warning(?) and I worry about
the testing coverage?

>  	/*
>  	 * If we can acquire the mmap_sem for read, and vma is VM_LOCKED,
>  	 * keep the sem while scanning the cluster for mlocking pages.
>
> ...
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]