I think it will be a significant improvement to have a single code path.
The code will be more robust and we won't need to face issues that are
specific for blocking.
If the cost is negligible, I think the upside is worth it.
rcu_read_lock and rcu_read_unlock has been proved as efficient enough,
and I don't think percpu_refcount is better than it, so I'd suggest to
not switch non-blocking into this way.
It's not a matter of which is better, its a matter of making the code
more robust because it has a single code-path. If moving to percpu_ref
is negligible, I would suggest to move both, I don't want to have two
completely different mechanism for blocking vs. non-blocking.
FWIW, I proposed an hctx percpu_ref over a year ago (but for a
completely different reason), and it was measured as too costly.
https://lore.kernel.org/linux-block/d4a4b6c0-3ea8-f748-85b0-6b39c5023a6f@xxxxxxxxx/
If this is the case, we shouldn't consider this as an alternative at
all, and move forward with either the original proposal or what
ming proposed to move a counter to the tagset.