On Wed, Oct 25, 2023 at 12:01:33PM -0700, Bart Van Assche wrote: > > On 10/24/23 18:33, Ming Lei wrote: > > Yeah, performance does drop when queue depth is cut to half if queue > > depth is low enough. > > > > However, it isn't enough to just test perf over one LUN, what is the > > perf effect when running IOs over the 2 or 5 data LUNs > > concurrently? > > I think that the results I shared are sufficient because these show the > worst possible performance impact of fair tag sharing (two active > logical units and much more activity on one logical unit than on the > other). You are talking about multi-lun case, and your change does affect multi-lun code path, but your test result doesn't cover multi-lun, is it enough? At least your patch shouldn't cause performance regression on multi-lun IO workloads, right? > > > SATA should have similar issue too, and I think the improvement may be > > more generic to bypass fair tag sharing in case of low queue depth > > (such as < 32) if turns out the fair tag sharing doesn't work well in > > case low queue depth. > > > > Also the 'fairness' could be enhanced dynamically by scsi LUN's > > queue depth, which can be adjusted dynamically. > > Most SATA devices are hard disks. Hard disk IOPS are constrained by the > speed with which the head of a hard disk can move. That speed hasn't > changed much during the past 40 years. I'm not sure that hard disks are > impacted as much as SSD devices by fair tag sharing. What I meant is that SATA's queue depth is often 32 or 31, and still have multi-lun cases. At least from what you shared, the fair tag sharing doesn't work well just because of low queue depth, nothing is actually related with UFS. That is why I am wondering that why not force to disable fairing sharing in case of low queue depth. > > Any algorithm that is more complicated than what I posted probably would > have a negative performance impact on storage devices that use NAND > technology, e.g. UFS devices. So I prefer to proceed with this patch > series and solve any issues with ATA devices separately. Once this patch > series has been merged, it could be used as a basis for a solution for > ATA devices. A solution for ATA devices does not have to be implemented > in the block layer core - it could e.g. be implemented in the ATA subsystem. I don't object to take the disabling fair sharing first, and I meant that the fairness may be brought back by adjusting scsi_device's queue depth in future. Thanks, Ming