Re: [PATCH] mmc:mmc-hsq:use fifo to dispatch mmc_request

Wenchao Chen <wenchao.chen666@xxxxxxxxx> · Thu, 24 Nov 2022 10:56:36 +0800

On Mon, Nov 21, 2022 at 2:19 PM Michael Wu <michael@xxxxxxxxxxxxxxxxx> wrote:
>
> On 11/18/2022 7:43 PM, Wenchao Chen wrote:
> > On Fri, Nov 18, 2022 at 1:52 PM Michael Wu <michael@xxxxxxxxxxxxxxxxx> wrote:
> >>
> >> Current next_tag selection will cause a large delay in some requests and
> >> destroy the scheduling results of the block scheduling layer. Because the
> >> issued mrq tags cannot ensure that each time is sequential, especially when
> >> the IO load is heavy. In the fio performance test, we found that 4k random
> >> read data was sent to mmc_hsq to start calling request_atomic It takes
> >> nearly 200ms to process the request, while mmc_hsq has processed thousands
> >> of other requests. So we use fifo here to ensure the first in, first out
> >> feature of the request and avoid adding additional delay to the request.
> >>
> >
> > Hi Michael
> > Is the test device an eMMC?
> > Could you share the fio test command if you want?
> > Can you provide more logs?
> >
> Hi Wenchao,
> Yes, the tested device is emmc.
> The test command we used is `./fio -name=Rand_Read_IOPS_Test
> -group_reporting -rw=random -bs=4K -numjobs=8 -directory=/data/data
> -size=1G -io_size=64M -nrfiles=1 -direct=1 -thread && rm
> /data/Rand_Read_IOPS_Test *`,  which replaces the io performance random
> read performance test of androidbench, and the file size is set to 1G, 8
> thread test configuration. Where /data uses f2fs and /data/data is a
> file encrypted path.
>
> After enabling the hsq configuration, we can clearly see from below fio
> test log that the minimum value of random reading is 3175 iops and the
> maximum value is 8554iops, and the maximum delay of io completion is
> about 200ms.
> ```
>      clat percentiles (usec):
>       |  1.00th=[   498],  5.00th=[   865], 10.00th=[   963], 20.00th=[
>   1045],
>       | 30.00th=[  1090], 40.00th=[  1139], 50.00th=[  1172], 60.00th=[
>   1221],
>       | 70.00th=[  1254], 80.00th=[  1319], 90.00th=[  1401], 95.00th=[
>   1614],
>       | 99.00th=[  2769], 99.50th=[  3589], 99.90th=[ 31589], 99.95th=[
> 66323],
>       | 99.99th=[200279]
>     bw (  KiB/s): min=12705, max=34225, per=100.00%, avg=23931.79,
> stdev=497.40, samples=345
>     iops        : min= 3175, max= 8554, avg=5981.67, stdev=124.38,
> samples=345
> ```
>
>
> ```
>      clat percentiles (usec):
>       |  1.00th=[  799],  5.00th=[  938], 10.00th=[  963], 20.00th=[  979],
>       | 30.00th=[  996], 40.00th=[ 1004], 50.00th=[ 1020], 60.00th=[ 1045],
>       | 70.00th=[ 1074], 80.00th=[ 1106], 90.00th=[ 1172], 95.00th=[ 1237],
>       | 99.00th=[ 1450], 99.50th=[ 1516], 99.90th=[ 1762], 99.95th=[ 2180],
>       | 99.99th=[ 9503]
>     bw (  KiB/s): min=29200, max=30944, per=100.00%, avg=30178.91,
> stdev=53.45, samples=272
>     iops        : min= 7300, max= 7736, avg=7544.62, stdev=13.38,
> samples=272
> ```
> When NOT enabling hsq, the minimum value of random reading is 7300 iops
> and the maximum value is 7736 iops, and the maximum delay of io is only
> 9 ms. Finally, we added debug to the mmc driver. The reason for locating
> the 200ms delay of hsq is due to the next tag selection of hsq.
>

Thank you very much for your Log. This patch can reduce latency, but I
have some questions:
1. FIO -rw does not have random, but it does have randread. Do you use
randread? In addition, "IO_SIZE=64M" means only 64M data is tested?
Refer to FIO:
https://fio.readthedocs.io/en/latest/fio_doc.html?highlight=io_size#cmdoption-arg-io-size
2. The style of "tag_tail" should remain the same as that of
"next_tag". Would "tail_tag" be better?
3. It is better to provide a comparison of sequential read, sequential
write and random write.

> ---
> Michael Wu