Re: Dirty/Access bits vs. page content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 24, 2014 at 1:02 PM, Hugh Dickins <hughd@xxxxxxxxxx> wrote:
>
> There is no need to free all the pages immediately after doing the
> TLB flush: that's merely how it's structured at present; page freeing
> can be left until the end as now, or when out from under the spinlock.

Hmm. In fact, if we to the actual TLB flush still under the ptl lock,
the current code will "just work". We can just keep the
set_page_dirty() at the scanning part, because there's no race with
mkclean() as long as we hold the lock.

So all that requires would be to split our current "tlb_flush_mmu()"
into the actual tlb flushing part, and the free_pages_and_swap_cache()
part. And then we do the TLB flushing inside the ptl, to make sure
that we flush tlb's before anybody can do mkclean().

And then we make the case of doing "set_page_dirty()" force a TLB
flush (but *not* force breaking out of the loop).

This gives us the best of all worlds:

 - maximum batching for the common case (no shared dirty pte entries)

 - if we find any dirty page table entries, we will batch as much as
we can within the ptl lock

 - we do the TLB shootdown holding the page table lock (but that's not
new - ptep_get_and_flush does the same

 - but we do the batched freeing of pages outside the lock

 - and the patch is pretty simple too (no need for the "one dirty bit
in the 'struct page *' pointer" games.

IOW, how about the attached patch that entirely replaces my previous
two patches. DaveH - does this fix your test-case, while _not_
introducing any new BUG_ON() triggers?

I didn't test the patch, maybe I did something stupid. It compiles for
me, but it only works for the HAVE_GENERIC_MMU_GATHER case, but
introducing tlb_flush_mmu_tlbonly() and tlb_flush_mmu_free() into the
non-generic cases should be trivial, since they really are just that
old "tlb_flush_mmu()" function split up (the tlb_flush_mmu() function
remains available for other non-forced flush users)

So assuming this does work for DaveH, then the arm/ia64/um/whatever
people would need to do those trivial transforms too, but it really
shouldn't be too painful.

Comments? DaveH?

               Linus
 mm/memory.c | 53 ++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 93e332d5ed77..037b812a9531 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -232,17 +232,18 @@ void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long
 #endif
 }
 
-void tlb_flush_mmu(struct mmu_gather *tlb)
+static void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 {
-	struct mmu_gather_batch *batch;
-
-	if (!tlb->need_flush)
-		return;
 	tlb->need_flush = 0;
 	tlb_flush(tlb);
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb_table_flush(tlb);
 #endif
+}
+
+static void tlb_flush_mmu_free(struct mmu_gather *tlb)
+{
+	struct mmu_gather_batch *batch;
 
 	for (batch = &tlb->local; batch; batch = batch->next) {
 		free_pages_and_swap_cache(batch->pages, batch->nr);
@@ -251,6 +252,14 @@ void tlb_flush_mmu(struct mmu_gather *tlb)
 	tlb->active = &tlb->local;
 }
 
+void tlb_flush_mmu(struct mmu_gather *tlb)
+{
+	if (!tlb->need_flush)
+		return;
+	tlb_flush_mmu_tlbonly(tlb);
+	tlb_flush_mmu_free(tlb);
+}
+
 /* tlb_finish_mmu
  *	Called at the end of the shootdown operation to free up any resources
  *	that were required.
@@ -1127,8 +1136,10 @@ again:
 			if (PageAnon(page))
 				rss[MM_ANONPAGES]--;
 			else {
-				if (pte_dirty(ptent))
+				if (pte_dirty(ptent)) {
+					force_flush = 1;
 					set_page_dirty(page);
+				}
 				if (pte_young(ptent) &&
 				    likely(!(vma->vm_flags & VM_SEQ_READ)))
 					mark_page_accessed(page);
@@ -1137,9 +1148,10 @@ again:
 			page_remove_rmap(page);
 			if (unlikely(page_mapcount(page) < 0))
 				print_bad_pte(vma, addr, ptent, page);
-			force_flush = !__tlb_remove_page(tlb, page);
-			if (force_flush)
+			if (unlikely(!__tlb_remove_page(tlb, page))) {
+				force_flush = 1;
 				break;
+			}
 			continue;
 		}
 		/*
@@ -1174,18 +1186,11 @@ again:
 
 	add_mm_rss_vec(mm, rss);
 	arch_leave_lazy_mmu_mode();
-	pte_unmap_unlock(start_pte, ptl);
 
-	/*
-	 * mmu_gather ran out of room to batch pages, we break out of
-	 * the PTE lock to avoid doing the potential expensive TLB invalidate
-	 * and page-free while holding it.
-	 */
+	/* Do the actual TLB flush before dropping ptl */
 	if (force_flush) {
 		unsigned long old_end;
 
-		force_flush = 0;
-
 		/*
 		 * Flush the TLB just for the previous segment,
 		 * then update the range to be the remaining
@@ -1193,11 +1198,21 @@ again:
 		 */
 		old_end = tlb->end;
 		tlb->end = addr;
-
-		tlb_flush_mmu(tlb);
-
+		tlb_flush_mmu_tlbonly(tlb);
 		tlb->start = addr;
 		tlb->end = old_end;
+	}
+	pte_unmap_unlock(start_pte, ptl);
+
+	/*
+	 * If we forced a TLB flush (either due to running out of
+	 * batch buffers or because we needed to flush dirty TLB
+	 * entries before releasing the ptl), free the batched
+	 * memory too. Restart if we didn't do everything.
+	 */
+	if (force_flush) {
+		force_flush = 0;
+		tlb_flush_mmu_free(tlb);
 
 		if (addr != end)
 			goto again;

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]