Re: [RFC PATCH] iommu/vt-d: Add IOTLB flush support for kernel addresses

David Woodhouse <dwmw2@xxxxxxxxxxxxx> · Fri, 23 Oct 2015 12:37:06 +0100

On Fri, 2015-10-23 at 13:03 +0200, Joerg Roedel wrote:
> On Fri, Oct 23, 2015 at 11:33:33AM +0100, David Woodhouse wrote:
> > Which means I'm pondering *renaming* tlb_flush_kernel_range() to
> > something like arch_tlb_flush_kernel_range() everywhere, then
> > having a
> > tlb_flush_kernel_range() inline function which optionally calls
> > iommu_flush_kernel_range() first.
> 
> That sounds like some work, but would be the super clean solution :)
> Given that only a handful of architecture besides x86 will need it
> (thinking of ARM64 and PPC), I prefer the solution below:
> 
> > Or I could reduce the churn by adding explicit calls to
> > iommu_flush_kernel_range() at every location that calls
> > tlb_flush_kernel_range(), but that's going to lead to some callers
> > missing the IOMMU flush.
> 
> Exactly like this, but when do we miss a flush here?

When we miss a call to flush_tlb_kernel_range() and forget to add the
corresponding call to iommu_flush_kernel_range() next to it. We'd need
to go trawling through arch code and make sure we get them all.

Or more likely, when someone adds a *new* call to
flush_tlb_kernel_range() and doesn't realise that they also need to
call iommu_flush_kernel_range().

It's more work up front to do the "super clean solution", but I think
it's worth it.

> > Not entirely. The device still gets to specify whether it's doing
> > supervisor or user mode access, for each request it makes. It doesn't
> > open the door to users just using kernel addresses and getting away
> > with it!
> > 
> > Sure, we need to trust the *device* — but we need to trust it to
> > provide the correct PASID too. Which basically means in the VFIO case
> > where the user gets *full* control of the device, we have to ensure
> > that it gets its own PASID table with only the *one* PASID in it, and
> > *that* PASID can't have supervisor mode.
> 
> Exactly, we need to trust the device and the device driver. But thats
> not different to a situation without an iommu. We just run into problems
> when a device-driver allows sending requests to access kernel-memory
> from user-space, so it needs more care from the driver writers/reviewers
> too.

It's more than that — it's equivalent to the situation *with* the
IOMMU.

Having a *separate* PASID which is the only PASID we can use for kernel
mode is *not* a security improvement. In the general case, if a user
can trick the device into setting the 'supervisor mode' bit on a given
access, it could probably just as easily trick the device into using
the separate kernel PASID for that access. In neither case is it as
simple as just asking the device to use a kernel address.

I'm not proposing it for that reason, which is why I'm objecting to
your 'we have to...' response. Although maybe I should shut up, because
I'm pleased you aren't objecting to my plan and saying that we *do*
need to permit supervisor-mode access in normal PASIDs.

> > +static inline void do_iommu_flush_ktlb(unsigned long start,
> > unsigned long end)
> > +{
> > +	iommu_flush_ktlb_fn *fn;
> > +	rcu_read_lock();
> > +	fn = rcu_dereference(iommu_flush_ktlb);
> > +	if (fn)
> > +		(*fn)(start, end);
> > +	rcu_read_unlock();
> > +}
> 
> Yes, that'll work too. When you read/update the iommu_flush_ktlb pointer
> atomically, you can even get away without rcu. The function it points to
> will not go away, so you can still call it even when the pointer turned
> NULL.

That's true. I do need to think hard about architectures with function
descriptors though, where it might not be possible to atomically update
a function pointer. Need to dust off my rusty PPC64 knowledge... :)

(Note that this could screw the RCU approach too.)

> > Maybe we could keep it simple and just declare that once the function
> > pointer is set, it may never be cleared? But I think we really do want
> > to avoid the out-of-line function call altogether in the case where
> > kernel PASIDs are not being used. Or at *least* the case where SVM
> > isn't being used at all.
> 
> Yes, thats why I think an inline function which does the checks would be
> a better solution. The mmu_notifiers implement it in the same way, so we
> would stay consistent with them.

You mean an inline function which checks for iommu->kernel_svm ∀ iommu?
And does the equivalent for other IOMMUs? I wouldn't want IOMMU
-specific code in there; just a decision about whether to call the out
-of-line function.

Or maybe if we are making PASID handling generic and system-wide, it
really does become a case of 'if (init_mm.pasid != -1)' ...?

-- 
dwmw2

Attachment:
smime.p7s

Description: S/MIME cryptographic signature