Re: [RFC PATCH] iommu/vt-d: Add IOTLB flush support for kernel addresses

David Woodhouse <dwmw2@xxxxxxxxxxxxx> · Fri, 23 Oct 2015 11:33:33 +0100

On Fri, 2015-10-23 at 12:20 +0200, Joerg Roedel wrote:
> On Tue, Oct 20, 2015 at 05:17:04PM +0100, David Woodhouse wrote:
> > Can we assume that only one type of SVM-capable IOMMU will be present
> > in the system at a time? Perhaps we could just register a single
> > function (intel_iommu_flush_kernel_pasid in the VT-d case) to be used
> > as a notifier callback from tlb_flush_kernel_range()? Rather than the
> > overhead of a *list* of notifiers?
> 
> Yes, a single notifier is certainly preferable to a list. It is just
> too easy for others to attach to this list silently and adding more
> overhead to kernel TLB flushing.

Yeah. It's easy enough to add that in the x86 tlb_flush_kernel_range()
but I think we actually want it to be cross-platform.

Which means I'm pondering *renaming* tlb_flush_kernel_range() to
something like arch_tlb_flush_kernel_range() everywhere, then having a
tlb_flush_kernel_range() inline function which optionally calls
iommu_flush_kernel_range() first.

Or I could reduce the churn by adding explicit calls to
iommu_flush_kernel_range() at every location that calls
tlb_flush_kernel_range(), but that's going to lead to some callers
missing the IOMMU flush.

> > But... that's because the PASID-space is currently per-IOMMU. The plan
> > is to have a *single* PASID-space system-wide, And then I still want to
> > retain the property that there can be only *one* kernel PASID.
> 
> That makes a lot of sense. Then we can check in the call-back simply if
> this pasid has users and bail out early when not.
> 
> > I have forbidden the use of a given PASID to access *both* kernel and
> > user addresses. I'm hoping we can get away with putting that
> > restriction into the generic SVM APIs.
> 
> We have to, having kernel-pasids already nullifies all protection the
> IOMMU provides, giving kernel-access to a process-pasid is security wise
> equivalent to accessing /dev/mem.

Not entirely. The device still gets to specify whether it's doing
supervisor or user mode access, for each request it makes. It doesn't
open the door to users just using kernel addresses and getting away
with it!

Sure, we need to trust the *device* — but we need to trust it to
provide the correct PASID too. Which basically means in the VFIO case
where the user gets *full* control of the device, we have to ensure
that it gets its own PASID table with only the *one* PASID in it, and
*that* PASID can't have supervisor mode.

But in the general case, apart from the fact that it makes life hard
for us, there's no fundamental security reason why we couldn't set the
bit which allows supervisor mode access to happen in *any* PASID.

> > So yeah, perhaps we can set the notifier pointer to NULL when there's
> > no kernel PASID assigned, and only set it to point to
> > ${MY_IOMMU}_flush_kernel_pasid() if/when there *is* one?
> 
> That sounds like it needs some clever locking. Instead of checking the
> function pointer it is probably easier to put the check for pasid-users
> into an inline function and just do the real flush-call only when
> necessary.

The locking isn't hard; it's just RCU. Which in the VT-d case is just
the same as the handling of the kernel PASID structure which tells us
if we have any work to do anyway.

What I have in my working tree right now (but will probably throw away)
looks something like...

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 6df2029..c9e0b6c 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -7,6 +7,22 @@
 #include <asm/processor.h>
 #include <asm/special_insns.h>
 
+#ifdef CONFIG_IOMMU_TLB_FLUSH
+#include <linux/rcupdate.h>
+typedef void (*iommu_flush_ktlb_fn)(unsigned long start, unsigned long end);
+extern iommu_flush_ktlb_fn __rcu iommu_flush_ktlb;
+
+static inline void do_iommu_flush_ktlb(unsigned long start, unsigned long end)
+{
+	iommu_flush_ktlb_fn *fn;
+	rcu_read_lock();
+	fn = rcu_dereference(iommu_flush_ktlb);
+	if (fn)
+		(*fn)(start, end);
+	rcu_read_unlock();
+}
+#endif
+
 #ifdef CONFIG_PARAVIRT
 #include <asm/paravirt.h>
 #else
@@ -223,6 +239,9 @@ static inline void reset_lazy_tlbstate(void)
 static inline void flush_tlb_kernel_range(unsigned long start,
 					  unsigned long end)
 {
+#ifdef CONFIG_IOMMU_FLUSH_TLB
+	do_iommu_flush_ktlb(start, end);
+#endif
 	flush_tlb_all();
 }
 
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 8ddb5d0..4122b49 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -266,6 +267,9 @@ static void do_kernel_range_flush(void *info)
 
 void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 {
+#ifdef CONFIG_IOMMU_FLUSH_TLB
+	do_iommu_flush_ktlb(start, end);
+#endif
 
 	/* Balance as user space task's flush, a bit conservative */
 	if (end == TLB_FLUSH_ALL ||


Maybe we could keep it simple and just declare that once the function pointer is set, it may never be cleared? But I think we really do want to avoid the out-of-line function call altogether in the case where kernel PASIDs are not being used. Or at *least* the case where SVM isn't being used at all.

-- 
-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@xxxxxxxxx                              Intel Corporation

Attachment:
smime.p7s

Description: S/MIME cryptographic signature