RE: [PATCH v4 0/3] Support disabling fair tag sharing

Avri Altman <Avri.Altman@xxxxxxx> · Wed, 25 Oct 2023 18:50:17 +0000



Hi,
> On Tue, Oct 24, 2023 at 09:41:50AM -0700, Bart Van Assche wrote:
> > On 10/23/23 19:28, Ming Lei wrote:
> > > On Mon, Oct 23, 2023 at 01:36:32PM -0700, Bart Van Assche wrote:
> > > > Performance of UFS devices is reduced significantly by the fair
> > > > tag sharing algorithm. This is because UFS devices have multiple
> > > > logical units and a limited queue depth (32 for UFS 3.1 devices)
> > > > and also because it takes time to give tags back after activity on
> > > > a request queue has stopped. This patch series addresses this
> > > > issue by introducing a flag that allows block drivers to disable fair
> sharing.
> > > >
> > > > Please consider this patch series for the next merge window.
> > >
> > > In previous post[1], you mentioned that the issue is caused by
> > > non-IO queue of WLUN, but in this version, looks there isn't such story
> any more.
> > >
> > > IMO, it isn't reasonable to account non-IO LUN for tag fairness, so
> > > solution could be to not take non-IO queue into account for fair tag
> > > sharing. But disabling fair tag sharing for this whole tagset could
> > > be too over-kill.
> > >
> > > And if you mean normal IO LUNs, can you share more details about the
> > > performance drop? such as the test case, how many IO LUNs, and how
> > > to observe performance drop, cause it isn't simple any more since
> > > multiple LUN's perf has to be considered.
> > >
> > > [1]
> > > https://lore.kernel.org/linux-block/20231018180056.2151711-1-bvanass
> > > che@xxxxxxx/
> >
> > Hi Ming,
> >
> > Submitting I/O to a WLUN is only one example of a use case that
> > activates the fair sharing algorithm for UFS devices. Another use case
> > is simultaneous activity for multiple data LUNs. Conventional UFS
> > devices typically have four data LUNs and zoned UFS devices typically
> > have five data LUNs. From an Android device with a zoned UFS
> > device:
> >
> > $ adb shell ls /sys/class/scsi_device
> > 0:0:0:0
> > 0:0:0:1
> > 0:0:0:2
> > 0:0:0:3
> > 0:0:0:4
> > 0:0:0:49456
> > 0:0:0:49476
> > 0:0:0:49488
> >
> > The first five are data logical units. The last three are WLUNs.
> >
> > For a block size of 4 KiB, I see 144 K IOPS for queue depth 31 and
> > 107 K IOPS for queue depth 15 (queue depth is reduced from 31 to 15 if
> > I/O is being submitted to two LUNs simultaneously). In other words,
> > disabling fair sharing results in up to 35% higher IOPS for small
> > reads and in case two logical units are active simultaneously. I think
> > that's a very significant performance difference.
This issue is known for a long time now.
Whenever we needed to provide performance measures,
We used to disable that mechanism, hacking the kernel locally - one way or the other.
I know that my peers are doing this as well.

> 
> Yeah, performance does drop when queue depth is cut to half if queue depth
> is low enough.
> 
> However, it isn't enough to just test perf over one LUN, what is the perf
> effect when running IOs over the 2 or 5 data LUNs concurrently?
> 
> SATA should have similar issue too, and I think the improvement may be
> more generic to bypass fair tag sharing in case of low queue depth (such as <
> 32) if turns out the fair tag sharing doesn't work well in case low queue
> depth.
> 
> Also the 'fairness' could be enhanced dynamically by scsi LUN's queue depth,
> which can be adjusted dynamically.
As far our concern as a ufs device manufacturers & users, 
We find the current proposal a clean and elegant solution we like to adopt.
Further, more sophisticated scheme can be adopted on top of disabling tag sharing.

Thanks,
Avri

> 
> 
> Thanks,
> Ming