On Wed, Jan 10, 2024 at 11:35 AM Kairui Song <ryncsn@xxxxxxxxx> wrote:
>
> Huang, Ying <ying.huang@xxxxxxxxx> 于2024年1月9日周二 10:05写道:
> >
> > Kairui Song <ryncsn@xxxxxxxxx> writes:
> >
> > > From: Kairui Song <kasong@xxxxxxxxxxx>
> > >
> > > Currently, shmem uses cluster readahead for all swap backends. Cluster
> > > readahead is not a good solution for ramdisk based device (ZRAM) at all.
> > >
> > > After switching to the new helper, most benchmarks showed a good result:
> > >
> > > - Single file sequence read:
> > > perf stat --repeat 20 dd if=/tmpfs/test of=/dev/null bs=1M count=8192
> > > (/tmpfs/test is a zero filled file, using brd as swap, 4G memcg limit)
> > > Before: 22.248 +- 0.549
> > > After: 22.021 +- 0.684 (-1.1%)
> > >
> > > - Random read stress test:
> > > fio -name=tmpfs --numjobs=16 --directory=/tmpfs \
> > > --size=256m --ioengine=mmap --rw=randread --random_distribution=random \
> > > --time_based --ramp_time=1m --runtime=5m --group_reporting
> > > (using brd as swap, 2G memcg limit)
> > >
> > > Before: 1818MiB/s
> > > After: 1888MiB/s (+3.85%)
> > >
> > > - Zipf biased random read stress test:
> > > fio -name=tmpfs --numjobs=16 --directory=/tmpfs \
> > > --size=256m --ioengine=mmap --rw=randread --random_distribution=zipf:1.2 \
> > > --time_based --ramp_time=1m --runtime=5m --group_reporting
> > > (using brd as swap, 2G memcg limit)
> > >
> > > Before: 31.1GiB/s
> > > After: 32.3GiB/s (+3.86%)
> > >
> > > So cluster readahead doesn't help much even for single sequence read,
> > > and for random stress test, the performance is better without it.
> > >
> > > Considering both memory and swap device will get more fragmented
> > > slowly, and commonly used ZRAM consumes much more CPU than plain
> > > ramdisk, false readahead could occur more frequently and waste
> > > more CPU. Direct SWAP is cheaper, so use the new helper and skip
> > > read ahead for SWP_SYNCHRONOUS_IO device.
> >
> > It's good to take advantage of swap_direct (no readahead). I also hopes
> > we can take advantage of VMA based swapin if shmem is accessed via mmap.
> > That appears possible.
>
> Good idea, that should be doable, will update the series.
Hi Ying,
Turns out it's quite complex to do VMA bases swapin readhead for shmem: VMA address / Page Tables doesn't contain swapin entry for shmem. For anon page simply read nearby page table is easy and good enough, but for shmem, it's stored in the inode mapping so the readahead needs to walk the inode mapping instead. That's doable but requires more work to make it actually usable. I've sent V3 without this feature, worth another series for this readahead extension.
>
> Huang, Ying <ying.huang@xxxxxxxxx> 于2024年1月9日周二 10:05写道:
> >
> > Kairui Song <ryncsn@xxxxxxxxx> writes:
> >
> > > From: Kairui Song <kasong@xxxxxxxxxxx>
> > >
> > > Currently, shmem uses cluster readahead for all swap backends. Cluster
> > > readahead is not a good solution for ramdisk based device (ZRAM) at all.
> > >
> > > After switching to the new helper, most benchmarks showed a good result:
> > >
> > > - Single file sequence read:
> > > perf stat --repeat 20 dd if=/tmpfs/test of=/dev/null bs=1M count=8192
> > > (/tmpfs/test is a zero filled file, using brd as swap, 4G memcg limit)
> > > Before: 22.248 +- 0.549
> > > After: 22.021 +- 0.684 (-1.1%)
> > >
> > > - Random read stress test:
> > > fio -name=tmpfs --numjobs=16 --directory=/tmpfs \
> > > --size=256m --ioengine=mmap --rw=randread --random_distribution=random \
> > > --time_based --ramp_time=1m --runtime=5m --group_reporting
> > > (using brd as swap, 2G memcg limit)
> > >
> > > Before: 1818MiB/s
> > > After: 1888MiB/s (+3.85%)
> > >
> > > - Zipf biased random read stress test:
> > > fio -name=tmpfs --numjobs=16 --directory=/tmpfs \
> > > --size=256m --ioengine=mmap --rw=randread --random_distribution=zipf:1.2 \
> > > --time_based --ramp_time=1m --runtime=5m --group_reporting
> > > (using brd as swap, 2G memcg limit)
> > >
> > > Before: 31.1GiB/s
> > > After: 32.3GiB/s (+3.86%)
> > >
> > > So cluster readahead doesn't help much even for single sequence read,
> > > and for random stress test, the performance is better without it.
> > >
> > > Considering both memory and swap device will get more fragmented
> > > slowly, and commonly used ZRAM consumes much more CPU than plain
> > > ramdisk, false readahead could occur more frequently and waste
> > > more CPU. Direct SWAP is cheaper, so use the new helper and skip
> > > read ahead for SWP_SYNCHRONOUS_IO device.
> >
> > It's good to take advantage of swap_direct (no readahead). I also hopes
> > we can take advantage of VMA based swapin if shmem is accessed via mmap.
> > That appears possible.
>
> Good idea, that should be doable, will update the series.
Hi Ying,
Turns out it's quite complex to do VMA bases swapin readhead for shmem: VMA address / Page Tables doesn't contain swapin entry for shmem. For anon page simply read nearby page table is easy and good enough, but for shmem, it's stored in the inode mapping so the readahead needs to walk the inode mapping instead. That's doable but requires more work to make it actually usable. I've sent V3 without this feature, worth another series for this readahead extension.