+ mmu_notifier-add-mmu_notifier_invalidate_range.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: mmu_notifier: add mmu_notifier_invalidate_range()
has been added to the -mm tree.  Its filename is
     mmu_notifier-add-mmu_notifier_invalidate_range.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mmu_notifier-add-mmu_notifier_invalidate_range.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mmu_notifier-add-mmu_notifier_invalidate_range.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/SubmitChecklist when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Joerg Roedel <jroedel@xxxxxxx>
Subject: mmu_notifier: add mmu_notifier_invalidate_range()

This notifier closes an important gap in the current mmu_notifier
implementation, the existing callbacks are called too early or too late to
reliably manage a non-CPU TLB.  Specifically, invalidate_range_start() is
called when all pages are still mapped and invalidate_range_end() when all
pages are unmapped and potentially freed.

This is fine when the users of the mmu_notifiers manage their own SoftTLB,
like KVM does.  When the TLB is managed in software it is easy to wipe out
entries for a given range and prevent new entries to be established until
invalidate_range_end is called.

But when the user of mmu_notifiers has to manage a hardware TLB it can
still wipe out TLB entries in invalidate_range_start, but it can't make
sure that no new TLB entries in the given range are established between
invalidate_range_start and invalidate_range_end.

To avoid silent data corruption the entries in the non-CPU TLB need to be
flushed when the pages are unmapped (at this point in time no _new_ TLB
entries can be established in the non-CPU TLB) but not yet freed (as the
non-CPU TLB may still have _existing_ entries pointing to the pages about
to be freed).

To fix this problem we need to catch the moment when the Linux VMM flushes
remote TLBs (as a non-CPU TLB is not very CPU TLB), as this is the point
in time when the pages are unmapped but _not_ yet freed.

The mmu_notifier_invalidate_range() function aims to catch that moment.

IOMMU code will be one user of the notifier-callback.  Currently this is
only the AMD IOMMUv2 driver, but its code is about to be more generalized
and converted to a generic IOMMU-API extension to fit the needs of similar
functionality in other IOMMUs as well.

The current attempt in the AMD IOMMUv2 driver to work around the
invalidate_range_start/end() shortcoming is to assign an empty page table
to the non-CPU TLB between any invalidata_range_start/end calls.  With the
empty page-table assigned, every page-table walk to re-fill the non-CPU
TLB will cause a page-fault reported to the IOMMU driver via an interrupt,
possibly causing interrupt storms.

The page-fault handler in the AMD IOMMUv2 driver doesn't handle the fault
if an invalidate_range_start/end pair is active, it just reports back
SUCESS to the device and let it refault the page.  But existing hardware
(newer Radeon GPUs) that makes use of this feature don't re-fault
indefinitly, after a certain number of faults for the same address the
device enters a failure state and needs to be resetted.

To avoid the GPUs entering a failure state we need to get rid of the
empty-page-table workaround and use the mmu_notifier_invalidate_range()
function introduced with this patch.

Signed-off-by: Joerg Roedel <jroedel@xxxxxxx>
Reviewed-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Reviewed-by: Jérôme Glisse <jglisse@xxxxxxxxxx>
Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: Johannes Weiner <jweiner@xxxxxxxxxx>
Cc: Jay Cornwall <Jay.Cornwall@xxxxxxx>
Cc: Oded Gabbay <Oded.Gabbay@xxxxxxx>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@xxxxxxx>
Cc: Jesse Barnes <jbarnes@xxxxxxxxxxxxxxxx>
Cc: David Woodhouse <dwmw2@xxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 include/linux/mmu_notifier.h |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff -puN include/linux/mmu_notifier.h~mmu_notifier-add-mmu_notifier_invalidate_range include/linux/mmu_notifier.h
--- a/include/linux/mmu_notifier.h~mmu_notifier-add-mmu_notifier_invalidate_range
+++ a/include/linux/mmu_notifier.h
@@ -242,6 +242,11 @@ static inline void mmu_notifier_invalida
 		__mmu_notifier_invalidate_range_end(mm, start, end);
 }
 
+static inline void mmu_notifier_invalidate_range(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+}
+
 static inline void mmu_notifier_mm_init(struct mm_struct *mm)
 {
 	mm->mmu_notifier_mm = NULL;
@@ -341,6 +346,11 @@ static inline void mmu_notifier_invalida
 				  unsigned long start, unsigned long end)
 {
 }
+
+static inline void mmu_notifier_invalidate_range(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+}
 
 static inline void mmu_notifier_mm_init(struct mm_struct *mm)
 {
_

Patches currently in -mm which might be from jroedel@xxxxxxx are

mmu_notifier-add-mmu_notifier_invalidate_range.patch
mmu_notifier-call-mmu_notifier_invalidate_range-from-vmm.patch
mmu_notifier-add-the-call-back-for-mmu_notifier_invalidate_range.patch
linux-next.patch

--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Kernel Newbies FAQ]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Photo]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux