On Thu, 2022-01-06 at 23:00 +0800, Ming Lei wrote: > On Thu, Jan 06, 2022 at 10:26:01AM +0000, Martin Wilck wrote: > > > > > Alternatively, we could inhibit increasing the device queue depth > > above > > a certain multiple of cmd_per_lun, and size the sbitmap by that > > limit. > > My gut feeling says that if cmd_per_lun == 7, it makes sense to use > > a > > limit of 32. That way the bitmap would fit into 2 pages; we'd still > > waste a lot, but it wouldn't matter much in absolute numbers. > > Thus we could forbid increasing the queue depth to more than the > > power > > of 2 above 4*cmd_per_lun. Does this make sense? > > I'd suggest to fix mpt3sas for avoiding this memory waste. Let's wait for Sreekanth's comment on that. mpt3sas is not the only driver using a low value. Qlogic drivers set cmd_per_lun=3, for example (with 3, our logic would use shift=6, so the issue I observed wouldn't occur - but it would be prone to cache line bouncing). > > (*) this calculation ignores the use of sb->map[i].depth. Taking it > > into account wouldn't change much. > > Yeah, I have actually one patch to remove sb->map[].depth, which can > reduce each map's size by 1/3. That sounds like a great idea to me. I've also been wondering whether it wouldn't be possible to use more than a single word in a cache line (given a high-enough number of cache lines). Martin