On 2019/8/2 下午8:46, Jason Gunthorpe wrote:
On Fri, Aug 02, 2019 at 05:40:07PM +0800, Jason Wang wrote:
This must be a proper barrier, like a spinlock, mutex, or
synchronize_rcu.
I start with synchronize_rcu() but both you and Michael raise some
concern.
I've also idly wondered if calling synchronize_rcu() under the various
mm locks is a deadlock situation.
Maybe, that's why I suggest to use vhost_work_flush() which is much
lightweight can can achieve the same function. It can guarantee all
previous work has been processed after vhost_work_flush() return.
Then I try spinlock and mutex:
1) spinlock: add lots of overhead on datapath, this leads 0 performance
improvement.
I think the topic here is correctness not performance improvement
But the whole series is to speed up vhost.
2) SRCU: full memory barrier requires on srcu_read_lock(), which still leads
little performance improvement
3) mutex: a possible issue is need to wait for the page to be swapped in (is
this unacceptable ?), another issue is that we need hold vq lock during
range overlap check.
I have a feeling that mmu notififers cannot safely become dependent on
progress of swap without causing deadlock. You probably should avoid
this.
Yes, so that's why I try to synchronize the critical region by myself.
And, again, you can't re-invent a spinlock with open coding and get
something better.
So the question is if waiting for swap is considered to be unsuitable for
MMU notifiers. If not, it would simplify codes. If not, we still need to
figure out a possible solution.
Btw, I come up another idea, that is to disable preemption when vhost thread
need to access the memory. Then register preempt notifier and if vhost
thread is preempted, we're sure no one will access the memory and can do the
cleanup.
I think you should use the spinlock so at least the code is obviously
functionally correct and worry about designing some properly justified
performance change after.
Jason
Spinlock is correct but make the whole series meaningless consider it
won't bring any performance improvement.
Thanks