Re: sequential 1MB mmap read ends in 1 page sync read-ahead

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry, forgot to add Andrew and linux-mm into CC.

On 4/13/23 23:27, Bernd Schubert wrote:
Hello,

I found a weird mmap read behavior while benchmarking the fuse-uring patches.
I did not verify yet, but it does not look fuse specific.
Basically, I started to check because fio results were much lower
than expected (better with the new code, though)

fio cmd line:
fio --size=1G --numjobs=1 --ioengine=mmap --output-format=normal,terse --directory=/scratch/dest/ --rw=read multi-file.fio


bernd@squeeze1 test2>cat multi-file.fio
[global]
group_reporting
bs=1M
runtime=300

[test]

This sequential fio sets POSIX_MADV_SEQUENTIAL and then does memcpy
beginning at offset 0 in 1MB steps (verified with additional
logging in fios engines/mmap.c).

And additional log in fuse_readahead() gives

[ 1396.215084] fuse: 000000003fdec504 inode=00000000be0f29d3 count=64 index=0 [ 1396.237466] fuse: 000000003fdec504 inode=00000000be0f29d3 count=64 index=255 [ 1396.263175] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1 index=254 [ 1396.282055] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1 index=253
... <count is always 1 page>
[ 1496.353745] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1 index=64 [ 1496.381105] fuse: 000000003fdec504 inode=00000000be0f29d3 count=64 index=511 [ 1496.397487] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1 index=510 [ 1496.416385] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1 index=509
... <count is always 1 page>

Logging in do_sync_mmap_readahead()

[ 1493.130764] do_sync_mmap_readahead:3015 ino=132 index=0 count=0 ras_start=0 ras_size=0 ras_async=0 ras_ra_pages=64 ras_mmap_miss=0 ras_prev_pos=-1 [ 1493.147173] do_sync_mmap_readahead:3015 ino=132 index=255 count=0 ras_start=0 ras_size=64 ras_async=32 ras_ra_pages=64 ras_mmap_miss=0 ras_prev_pos=-1 [ 1493.165952] do_sync_mmap_readahead:3015 ino=132 index=254 count=0 ras_start=0 ras_size=64 ras_async=32 ras_ra_pages=64 ras_mmap_miss=0 ras_prev_pos=-1 [ 1493.185566] do_sync_mmap_readahead:3015 ino=132 index=253 count=0 ras_start=0 ras_size=64 ras_async=32 ras_ra_pages=64 ras_mmap_miss=0 ras_prev_pos=-1
...
[ 1496.341890] do_sync_mmap_readahead:3015 ino=132 index=64 count=0 ras_start=0 ras_size=64 ras_async=32 ras_ra_pages=64 ras_mmap_miss=0 ras_prev_pos=-1 [ 1496.361385] do_sync_mmap_readahead:3015 ino=132 index=511 count=0 ras_start=96 ras_size=64 ras_async=64 ras_ra_pages=64 ras_mmap_miss=0 ras_prev_pos=-1


So we can see from fuse that it starts to read at page index 0, wants
64 pages (which is actually the double of bdi read_ahead_kb), then
skips index 64...254) and immediately goes to index 255. For the mmaped
memcpy pages are missing and then it goes back in 1 page steps to get
these.

A workaround here is to set read_ahead_kb in the bdi to a larger
value, another workaround might be (untested) to increase the read-ahead
window. Either of these two seem to be workarounds for the index order
above.

I understand that read-ahead gets limited by the bdi value (although
exceeded above), but why does it go back in 1 page steps? My expectation
would have been

index=0  count=32 (128kb read-head)
index=32 count=32
index=64 count=32
...


This is with plain 6.2 + fuse-uring patches.

Thanks,
Bernd



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux