On Tue, Jul 28, 2020 at 08:54:27AM +0100, John Garry wrote: > On 24/07/2020 03:47, Ming Lei wrote: > > On Thu, Jul 23, 2020 at 06:29:01PM +0100, John Garry wrote: > > > > > As I see, since megaraid will have 1:1 mapping of CPU to hw queue, will > > > > > there only ever possibly a single bit set in ctx_map? If so, it seems a > > > > > waste to always check every sbitmap map. But adding logic for this may > > > > > negate any possible gains. > > > > > > > > It really depends on min and max cpu id in the map, then sbitmap > > > > depth can be reduced to (max - min + 1). I'd suggest to double check that > > > > cost of sbitmap_any_bit_set() really matters. > > > > > > Hi Ming, > > > > > > I'm not sure that reducing the search range would help much, as we still > > > need to load some indexes of map[], and at best this may be reduced from 2/3 > > > -> 1 elements, depending on nr_cpus. > > > > I believe you misunderstood my idea, and you have to think it from implementation > > viewpoint. > > > > The only workable way is to store the min cpu id as 'offset' and set the sbitmap > > depth as (max - min + 1), isn't it? Then the actual cpu id can be figured out via > > 'offset' + nr_bit. And the whole indexes are just spread on the actual depth. BTW, > > max & min is the max / min cpu id in hctx->cpu_map. So we can improve not only on 1:1, > > and I guess most of MQ cases can benefit from the change, since it shouldn't be usual > > for one ctx_map to cover both 0 & nr_cpu_id - 1. > > > > Meantime, we need to allocate the sbitmap dynamically. > > OK, so dynamically allocating the sbitmap could be good. I was thinking > previously that we still allocate for nr_cpus size, and search a limited > range - but this would have heavier runtime overhead. > > So if you really think that this may have some value, then let me know, so > we can look to take it forward. Forget to mention, the in-tree code has been this shape for long time, please see sbitmap_resize() called from blk_mq_map_swqueue(). Another update is that V4 of 'scsi: core: only re-run queue in scsi_end_request() if device queue is busy' is quite hard to implement since commit b4fd63f42647110c9 ("Revert "scsi: core: run queue if SCSI device queue isn't ready and queue is idle"). Thanks, Ming