On 28/07/2020 09:45, Ming Lei wrote:
OK, so dynamically allocating the sbitmap could be good. I was thinking previously that we still allocate for nr_cpus size, and search a limited range - but this would have heavier runtime overhead. So if you really think that this may have some value, then let me know, so we can look to take it forward.
Hi Ming,
Forget to mention, the in-tree code has been this shape for long time, please see sbitmap_resize() called from blk_mq_map_swqueue().
So after the resize, even if we are only checking a single word and a few bits within that word, we still need 2x 64b loads - 1x for .word and 1x for .cleared. Seems a bit inefficient for caching when we have a 1:1 mapping or similar. For 1:1 case only, how about a ctx_map per queue for all hctx, with a single bit per hctx? I do realize that it makes the code more complicated, but it could be more efficient.
Another thing to consider is that for ctx_map, we don't do deferred bit clear, so we don't ever really need to check .cleared there. I think.
Thanks, John