On Thu, Jul 30, 2020 at 02:03:02PM -0700, Sagi Grimberg wrote: > > > > > > > > > I think it will be a significant improvement to have a single code path. > > > > > > > > The code will be more robust and we won't need to face issues that are > > > > > > > > specific for blocking. > > > > > > > > > > > > > > > > If the cost is negligible, I think the upside is worth it. > > > > > > > > > > > > > > > > > > > > > > rcu_read_lock and rcu_read_unlock has been proved as efficient enough, > > > > > > > and I don't think percpu_refcount is better than it, so I'd suggest to > > > > > > > not switch non-blocking into this way. > > > > > > > > > > > > It's not a matter of which is better, its a matter of making the code > > > > > > more robust because it has a single code-path. If moving to percpu_ref > > > > > > is negligible, I would suggest to move both, I don't want to have two > > > > > > completely different mechanism for blocking vs. non-blocking. > > > > > > > > > > FWIW, I proposed an hctx percpu_ref over a year ago (but for a > > > > > completely different reason), and it was measured as too costly. > > > > > > > > > > https://lore.kernel.org/linux-block/d4a4b6c0-3ea8-f748-85b0-6b39c5023a6f@xxxxxxxxx/ > > > > > > > > If this is the case, we shouldn't consider this as an alternative at all, > > > > and move forward with either the original proposal or what > > > > ming proposed to move a counter to the tagset. > > > > > > Well, the point I was trying to make is that we shouldn't bother making > > > blocking and non-blocking dispatchers use the same synchronization since > > > non-blocking has a very cheap solution that blocking can't use. > > > > I fully agree with that point. > > I also agree, just said we should use the same mechanisms, IFF its not > expensive. But I'm concerned that if we use completely different mechanisms > we are likely to get wrong assumptions and break one at some > point. > > Hence my suggestion to move back to srcu and place the rcu_head in the hctx. SRCU has been different enough compared with RCU, either implementation or API interface. Then I'd still suggest to replace SRCU with percpu refcount. Then we can have a simpler quiesce implementation. Thanks, Ming