On 5/30/24 13:47, Keith Busch wrote:
I suggested running a more lopsided workload on a high contention tag set: here's an example fio profile to exaggerate this: --- [global] rw=randread direct=1 ioengine=io_uring time_based runtime=60 ramp_time=10 [zero] bs=131072 filename=/dev/nvme0n1 iodepth=256 iodepth_batch_submit=64 iodepth_batch_complete=64 [one] bs=512 filename=/dev/nvme0n2 iodepth=1 -- My test nvme device has 2 namespaces, 1 IO queue, and only 63 tags. Without your patch: zero: (groupid=0, jobs=1): err= 0: pid=465: Thu May 30 13:29:43 2024 read: IOPS=14.0k, BW=1749MiB/s (1834MB/s)(103GiB/60002msec) lat (usec): min=2937, max=40980, avg=16990.33, stdev=1732.37 ... one: (groupid=0, jobs=1): err= 0: pid=466: Thu May 30 13:29:43 2024 read: IOPS=2726, BW=1363KiB/s (1396kB/s)(79.9MiB/60001msec) lat (usec): min=45, max=4859, avg=327.52, stdev=335.25 With your patch: zero: (groupid=0, jobs=1): err= 0: pid=341: Thu May 30 13:36:26 2024 read: IOPS=14.8k, BW=1852MiB/s (1942MB/s)(109GiB/60004msec) lat (usec): min=3103, max=26191, avg=16322.77, stdev=1138.04 ... one: (groupid=0, jobs=1): err= 0: pid=342: Thu May 30 13:36:26 2024 read: IOPS=1841, BW=921KiB/s (943kB/s)(54.0MiB/60001msec) lat (usec): min=51, max=5935, avg=503.81, stdev=608.41 So there's definitely a difference here that harms the lesser used device for a modest gain on the higher demanding device. Does it matter? I really don't know if I can answer that. It's just different is all I'm saying.
Hi Keith, Thank you for having run this test. I propose that users who want better fairness than what my patch supports use an appropriate mechanism for improving fairness (e.g. blk-iocost or blk-iolat). This leaves the choice between maximum performance and maximum fairness to the user. Does this sound good to you? Thanks, Bart.