Hi, I found that the latency output from fio (running in my own embedded board) increases when the size is also increased, the output is: fio --direct=0 --iodepth=1 --thread --rw=read --ioengine=sync --bs=4k --size=16M --numjobs=1 --runtime=30 --group_reporting --filename=/tmp/f0 --name=dummy f0: Laying out IO file (1 file / 16MiB) f0: (groupid=0, jobs=1): err= 0: pid=3: Mon Jun 26 05:41:47 2023 read: IOPS=15.6k, BW=61.1MiB/s (64.0MB/s)(16.0MiB/262msec) clat (usec): min=51, max=181, avg=55.75, stdev= 5.63 lat (usec): min=52, max=182, avg=56.62, stdev= 5.67 clat percentiles (usec): | 1.00th=[ 52], 5.00th=[ 53], 10.00th=[ 53], 20.00th=[ 54], | 30.00th=[ 55], 40.00th=[ 55], 50.00th=[ 56], 60.00th=[ 56], | 70.00th=[ 57], 80.00th=[ 57], 90.00th=[ 58], 95.00th=[ 60], | 99.00th=[ 85], 99.50th=[ 92], 99.90th=[ 108], 99.95th=[ 127], | 99.99th=[ 182] lat (usec) : 100=99.73%, 250=0.27% cpu : usr=35.25%, sys=0.00%, ctx=0, majf=0, minf=0 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=4096,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: bw=61.1MiB/s (64.0MB/s), 61.1MiB/s-61.1MiB/s (64.0MB/s-64.0MB/s), io=16.0MiB (16.8MB), run=262-262msec fio --direct=0 --iodepth=1 --thread --rw=read --ioengine=sync --bs=4k --size=32M --numjobs=1 --runtime=30 --group_reporting --filename=/tmp/f0 --name=dummy f0: Laying out IO file (1 file / 32MiB) f0: (groupid=0, jobs=1): err= 0: pid=3: Mon Jun 26 05:43:47 2023 read: IOPS=12.4k, BW=48.4MiB/s (50.8MB/s)(32.0MiB/661msec) clat (usec): min=64, max=420, avg=72.39, stdev= 8.85 lat (usec): min=65, max=421, avg=73.27, stdev= 8.87 clat percentiles (usec): | 1.00th=[ 67], 5.00th=[ 68], 10.00th=[ 68], 20.00th=[ 69], | 30.00th=[ 70], 40.00th=[ 71], 50.00th=[ 72], 60.00th=[ 73], | 70.00th=[ 74], 80.00th=[ 75], 90.00th=[ 76], 95.00th=[ 78], | 99.00th=[ 115], 99.50th=[ 131], 99.90th=[ 163], 99.95th=[ 169], | 99.99th=[ 420] bw ( KiB/s): min=48840, max=48840, per=98.52%, avg=48840.00, stdev= 0.00, samples=1 iops : min=12210, max=12210, avg=12210.00, stdev= 0.00, samples=1 lat (usec) : 100=98.57%, 250=1.42%, 500=0.01% cpu : usr=23.03%, sys=0.00%, ctx=0, majf=0, minf=0 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=8192,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: bw=48.4MiB/s (50.8MB/s), 48.4MiB/s-48.4MiB/s (50.8MB/s-50.8MB/s), io=32.0MiB (33.6MB), run=661-661msec Does anyone help explain why the latency increases when the size changes from 16M to 32M? Thanks!