I think it will be a significant improvement to have a single code path.
The code will be more robust and we won't need to face issues that are
specific for blocking.
If the cost is negligible, I think the upside is worth it.
rcu_read_lock and rcu_read_unlock has been proved as efficient enough,
and I don't think percpu_refcount is better than it, so I'd suggest to
not switch non-blocking into this way.
It's not a matter of which is better, its a matter of making the code
more robust because it has a single code-path. If moving to percpu_ref
is negligible, I would suggest to move both, I don't want to have two
completely different mechanism for blocking vs. non-blocking.
FWIW, I proposed an hctx percpu_ref over a year ago (but for a
completely different reason), and it was measured as too costly.
https://lore.kernel.org/linux-block/d4a4b6c0-3ea8-f748-85b0-6b39c5023a6f@xxxxxxxxx/
If this is the case, we shouldn't consider this as an alternative at all,
and move forward with either the original proposal or what
ming proposed to move a counter to the tagset.
Well, the point I was trying to make is that we shouldn't bother making
blocking and non-blocking dispatchers use the same synchronization since
non-blocking has a very cheap solution that blocking can't use.
I fully agree with that point.
I also agree, just said we should use the same mechanisms, IFF its not
expensive. But I'm concerned that if we use completely different
mechanisms we are likely to get wrong assumptions and break one at some
point.
Hence my suggestion to move back to srcu and place the rcu_head in the hctx.