Re: [PATCH V2 7/9] vhost: do not use RCU to synchronize MMU notifier with worker

Jason Wang <jasowang@xxxxxxxxxx> · Mon, 5 Aug 2019 16:21:48 +0800

On 2019/8/5 下午2:28, Michael S. Tsirkin wrote:
On Mon, Aug 05, 2019 at 12:33:45PM +0800, Jason Wang wrote:
On 2019/8/2 下午10:03, Michael S. Tsirkin wrote:
On Fri, Aug 02, 2019 at 05:40:07PM +0800, Jason Wang wrote:
Btw, I come up another idea, that is to disable preemption when vhost thread
need to access the memory. Then register preempt notifier and if vhost
thread is preempted, we're sure no one will access the memory and can do the
cleanup.
Great, more notifiers :(

Maybe can live with
1- disable preemption while using the cached pointer
2- teach vhost to recover from memory access failures,
     by switching to regular from/to user path

I don't get this, I believe we want to recover from regular from/to user
path, isn't it?
That (disable copy to/from user completely) would be a nice to have
since it would reduce the attack surface of the driver, but e.g. your
code already doesn't do that.

Yes since it requires a lot of changes.

So if you want to try that, fine since it's a step in
the right direction.

But I think fundamentally it's not what we want to do long term.

Yes.

It's always been a fundamental problem with this patch series that only
metadata is accessed through a direct pointer.

The difference in ways you handle metadata and data is what is
now coming and messing everything up.

I do propose soemthing like this in the past:
https://www.spinics.net/lists/linux-virtualization/msg36824.html. But looks
like you have some concern about its locality.
Right and it doesn't go away. You'll need to come up
with a test that messes it up and triggers a worst-case
scenario, so we can measure how bad is that worst-case.

But the problem still there, GUP can do page fault, so still need to
synchronize it with MMU notifiers.
I think the idea was, if GUP would need a pagefault, don't
do a GUP and do to/from user instead.

But this still need to be synchronized with MMU notifiers (or using 
dedicated work for GUP).

  Hopefully that
will fault the page in and the next access will go through.

The solution might be something like
moving GUP to a dedicated kind of vhost work.
Right, generally GUP.

So if continuing the direct map approach,
what is needed is a cache of mapped VM memory, then on a cache miss
we'd queue work along the lines of 1-2 above.

That's one direction to take. Another one is to give up on that and
write our own version of uaccess macros.  Add a "high security" flag to
the vhost module and if not active use these for userspace memory
access.

Or using SET_BACKEND_FEATURES?
No, I don't think it's considered best practice to allow unpriveledged
userspace control over whether kernel enables security features.

Get this.

But do you mean permanent GUP as I did in
original RFC https://lkml.org/lkml/2018/12/13/218?

Thanks
Permanent GUP breaks THP and NUMA.

Yes.

Thanks