Re: Readahead regressed with c1f6925e1091("mm: put readahead pages in cache earlier") on multicore arm64 platforms

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 7, 2021 at 12:08 PM Hsin-Yi Wang <hsinyi@xxxxxxxxxxxx> wrote:
>
> On Wed, Oct 6, 2021 at 9:12 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> >
> > On Wed, Oct 06, 2021 at 09:07:56PM +0800, Hsin-Yi Wang wrote:
> > > On Wed, Oct 6, 2021 at 7:21 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > > >
> > > > On Wed, Oct 06, 2021 at 05:25:23PM +0800, Hsin-Yi Wang wrote:
> > > > > Hi Matthew,
> > > > >
> > > > > We tested that the performance of readahead is regressed on multicore
> > > > > arm64 platforms running on the 5.10 kernel.
> > > > > - The platform we used: 8 cores (4x a53(small), 4x a73(big)) arm64 platform
> > > > > - The command we used: ureadahead $FILE ($FILE is a 1MB+ pack file,
> > > > > note that if the file size is small, it's not obvious to see the
> > > > > regression)
> > > > >
> > > > > After we revert the commit c1f6925e1091("mm: put readahead pages in
> > > > > cache earlier"), the readahead performance is back:
> > > > > - time ureadahead $FILE:
> > > > >   - 5.10: 1m23.124s
> > > > >   - with c1f6925e1091 reverted: 0m3.323s
> > > > >   - other LTS kernel (eg. 5.4): 0m3.066s
> > > > >
> > > > > The slowest part is aops->readpage() in read_pages() called in
> > > > > read_pages(ractl, &page_pool, false); (the 3rd in
> > > > > page_cache_ra_unbounded())
> > > >
> > > > What filesystem are you using?
> > > >
> > > ext4, block size 4096
> >
> > That's confusing.  ext4 shouldn't hit that path; it has a ->readahead
> > address space operation.
>
> Sorry for the confusion, both readahead and readpage are called.
> The ->readpage is called by vfs: vfs_fadvise.
> (Full path)
> read_pages

This calls into squashfs_readpage().
The data pasted before is with SQUASHFS_DECOMP_SINGLE.
However if using SQUASHFS_DECOMP_MULTI_PERCPU config:
- 5.10:
  1. real 0m1.692s, sys 0m4.188s
  2. real 0m1.655s, sys 0m4.175s
- 5.10 with c1f6925e1091 reverted:
  1. real 0m1.549s, 0m3.616s
  2. real 0m1.603s, 0m3.638s
which is slightly better but the difference is not that much as using
SQUASHFS_DECOMP_SINGLE.

> page_cache_ra_unbounded
> do_page_cache_ra
> force_page_cache_ra
> generic_fadvise
> vfs_fadvise
> ksys_readahead
> __arm64_compat_sys_aarch32_readahead
> el0_svc_common
> do_el0_svc_compat
> el0_svc_compat
> el0_sync_compat_handler
> el0_sync_compat
>
> The ->readahead is called by ext4: ext4_file_read_iter. But this part is fast.
> (Full path)
> read_pages

This calls into ext4_readahead().

> page_cache_ra_unbounded
> do_page_cache_ra
> ondemand_readahead
> page_cache_sync_ra
> generic_file_buffered_read
> generic_file_read_iter
> ext4_file_read_iter
> do_iter_readv_writev
> do_iter_read
> vfs_iter_read
> loop_queue_work
> kthread_worker_fn
> loop_kthread_worker_fn
> kthread
> ret_from_fork




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux