Hi, > On Tue, Oct 24, 2023 at 09:41:50AM -0700, Bart Van Assche wrote: > > On 10/23/23 19:28, Ming Lei wrote: > > > On Mon, Oct 23, 2023 at 01:36:32PM -0700, Bart Van Assche wrote: > > > > Performance of UFS devices is reduced significantly by the fair > > > > tag sharing algorithm. This is because UFS devices have multiple > > > > logical units and a limited queue depth (32 for UFS 3.1 devices) > > > > and also because it takes time to give tags back after activity on > > > > a request queue has stopped. This patch series addresses this > > > > issue by introducing a flag that allows block drivers to disable fair > sharing. > > > > > > > > Please consider this patch series for the next merge window. > > > > > > In previous post[1], you mentioned that the issue is caused by > > > non-IO queue of WLUN, but in this version, looks there isn't such story > any more. > > > > > > IMO, it isn't reasonable to account non-IO LUN for tag fairness, so > > > solution could be to not take non-IO queue into account for fair tag > > > sharing. But disabling fair tag sharing for this whole tagset could > > > be too over-kill. > > > > > > And if you mean normal IO LUNs, can you share more details about the > > > performance drop? such as the test case, how many IO LUNs, and how > > > to observe performance drop, cause it isn't simple any more since > > > multiple LUN's perf has to be considered. > > > > > > [1] > > > https://lore.kernel.org/linux-block/20231018180056.2151711-1-bvanass > > > che@xxxxxxx/ > > > > Hi Ming, > > > > Submitting I/O to a WLUN is only one example of a use case that > > activates the fair sharing algorithm for UFS devices. Another use case > > is simultaneous activity for multiple data LUNs. Conventional UFS > > devices typically have four data LUNs and zoned UFS devices typically > > have five data LUNs. From an Android device with a zoned UFS > > device: > > > > $ adb shell ls /sys/class/scsi_device > > 0:0:0:0 > > 0:0:0:1 > > 0:0:0:2 > > 0:0:0:3 > > 0:0:0:4 > > 0:0:0:49456 > > 0:0:0:49476 > > 0:0:0:49488 > > > > The first five are data logical units. The last three are WLUNs. > > > > For a block size of 4 KiB, I see 144 K IOPS for queue depth 31 and > > 107 K IOPS for queue depth 15 (queue depth is reduced from 31 to 15 if > > I/O is being submitted to two LUNs simultaneously). In other words, > > disabling fair sharing results in up to 35% higher IOPS for small > > reads and in case two logical units are active simultaneously. I think > > that's a very significant performance difference. This issue is known for a long time now. Whenever we needed to provide performance measures, We used to disable that mechanism, hacking the kernel locally - one way or the other. I know that my peers are doing this as well. > > Yeah, performance does drop when queue depth is cut to half if queue depth > is low enough. > > However, it isn't enough to just test perf over one LUN, what is the perf > effect when running IOs over the 2 or 5 data LUNs concurrently? > > SATA should have similar issue too, and I think the improvement may be > more generic to bypass fair tag sharing in case of low queue depth (such as < > 32) if turns out the fair tag sharing doesn't work well in case low queue > depth. > > Also the 'fairness' could be enhanced dynamically by scsi LUN's queue depth, > which can be adjusted dynamically. As far our concern as a ufs device manufacturers & users, We find the current proposal a clean and elegant solution we like to adopt. Further, more sophisticated scheme can be adopted on top of disabling tag sharing. Thanks, Avri > > > Thanks, > Ming