On Thu, 2022-01-06 at 23:41 +0800, Ming Lei wrote: > On Thu, Jan 06, 2022 at 03:22:53PM +0000, Martin Wilck wrote: > > > > > > I'd suggest to fix mpt3sas for avoiding this memory waste. > > > > Let's wait for Sreekanth's comment on that. > > > > mpt3sas is not the only driver using a low value. Qlogic drivers > > set > > cmd_per_lun=3, for example (with 3, our logic would use shift=6, so > > the > > issue I observed wouldn't occur - but it would be prone to cache > > line > > bouncing). > > But qlogic has smaller .can_queue which looks at most 512, .can_queue > is > the depth for allocating sbitmap, since each sdev->queue_depth is <= > .can_queue. I'm seeing here (on an old kernel, admittedly) cmd_per_lun=3 and can_queue=2038 for qla2xxx and cmd_per_lun=3 and can_queue=5884 for lpfc. Both drivers change the queue depth for devices to 64 in their slave_configure() methods. Many drivers do this, as it's recommended in scsi_host.h. That's quite bad in view of the current bitmap allocation logic - we lay out the bitmap assuming the depth used will be cmd_per_lun, but that doesn't match the actual depth when the device comes online. For qla2xxx, it means that we'd allocate the sbitmap with shift=6 (64 bits per word), thus using just a single cache line for 64 requests. Shift=4 (16 bits per word) would be the default shift for depth 64. Am I misreading the code? Perhaps we should only allocate a preliminary sbitmap in scsi_alloc_sdev, and reallocate it after slave_configure() has been called, to get the shift right for the driver's default settings? Thanks, Martin