Re: WARNING in __mmdrop

Jason Wang <jasowang@xxxxxxxxxx> · Fri, 26 Jul 2019 20:53:18 +0800

On 2019/7/26 下午8:38, Michael S. Tsirkin wrote:
On Fri, Jul 26, 2019 at 08:00:58PM +0800, Jason Wang wrote:
On 2019/7/26 下午7:49, Michael S. Tsirkin wrote:
On Thu, Jul 25, 2019 at 10:25:25PM +0800, Jason Wang wrote:
On 2019/7/25 下午9:26, Michael S. Tsirkin wrote:
Exactly, and that's the reason actually I use synchronize_rcu() there.

So the concern is still the possible synchronize_expedited()?
I think synchronize_srcu_expedited.

synchronize_expedited sends lots of IPI and is bad for realtime VMs.

Can I do this
on through another series on top of the incoming V2?

Thanks

The question is this: is this still a gain if we switch to the
more expensive srcu? If yes then we can keep the feature on,
I think we only care about the cost on srcu_read_lock() which looks pretty
tiny form my point of view. Which is basically a READ_ONCE() + WRITE_ONCE().

Of course I can benchmark to see the difference.

if not we'll put it off until next release and think
of better solutions. rcu->srcu is just a find and replace,
don't see why we need to defer that. can be a separate patch
for sure, but we need to know how well it works.
I think I get here, let me try to do that in V2 and let's see the numbers.

Thanks

It looks to me for tree rcu, its srcu_read_lock() have a mb() which is too
expensive for us.
I will try to ponder using vq lock in some way.
Maybe with trylock somehow ...

Ok, let me retry if necessary (but I do remember I end up with deadlocks 
last try).

If we just worry about the IPI,
With synchronize_rcu what I would worry about is that guest is stalled

Can this synchronize_rcu() be triggered by guest? If yes, there are 
several other MMU notifiers that can block. Is vhost something special here?

because system is busy because of other guests.
With expedited it's the IPIs...

The current synchronize_rcu()  can force a expedited grace period:

void synchronize_rcu(void)
{
        ...
        if (rcu_blocking_is_gp())
return;
        if (rcu_gp_is_expedited())
synchronize_rcu_expedited();
else
wait_rcu_gp(call_rcu);
}
EXPORT_SYMBOL_GPL(synchronize_rcu);

can we do something like in
vhost_invalidate_vq_start()?

         if (map) {
                 /* In order to avoid possible IPIs with
                  * synchronize_rcu_expedited() we use call_rcu() +
                  * completion.
*/
init_completion(&c.completion);
                 call_rcu(&c.rcu_head, vhost_finish_vq_invalidation);
wait_for_completion(&c.completion);
                 vhost_set_map_dirty(vq, map, index);
vhost_map_unprefetch(map);
         }

?
Why would that be faster than synchronize_rcu?

No faster but no IPI.

There's one other thing that bothers me, and that is that
for large rings which are not physically contiguous
we don't implement the optimization.

For sure, that can wait, but I think eventually we should
vmap large rings.

Yes, worth to try. But using direct map has its own advantage: it can use
hugepage that vmap can't

Thanks
Sure, so we can do that for small rings.

Yes, that's possible but should be done on top.

Thanks