On Thu, Oct 7, 2021 at 12:08 PM Hsin-Yi Wang <hsinyi@xxxxxxxxxxxx> wrote: > > On Wed, Oct 6, 2021 at 9:12 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > > On Wed, Oct 06, 2021 at 09:07:56PM +0800, Hsin-Yi Wang wrote: > > > On Wed, Oct 6, 2021 at 7:21 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > > > > > > On Wed, Oct 06, 2021 at 05:25:23PM +0800, Hsin-Yi Wang wrote: > > > > > Hi Matthew, > > > > > > > > > > We tested that the performance of readahead is regressed on multicore > > > > > arm64 platforms running on the 5.10 kernel. > > > > > - The platform we used: 8 cores (4x a53(small), 4x a73(big)) arm64 platform > > > > > - The command we used: ureadahead $FILE ($FILE is a 1MB+ pack file, > > > > > note that if the file size is small, it's not obvious to see the > > > > > regression) > > > > > > > > > > After we revert the commit c1f6925e1091("mm: put readahead pages in > > > > > cache earlier"), the readahead performance is back: > > > > > - time ureadahead $FILE: > > > > > - 5.10: 1m23.124s > > > > > - with c1f6925e1091 reverted: 0m3.323s > > > > > - other LTS kernel (eg. 5.4): 0m3.066s > > > > > > > > > > The slowest part is aops->readpage() in read_pages() called in > > > > > read_pages(ractl, &page_pool, false); (the 3rd in > > > > > page_cache_ra_unbounded()) > > > > > > > > What filesystem are you using? > > > > > > > ext4, block size 4096 > > > > That's confusing. ext4 shouldn't hit that path; it has a ->readahead > > address space operation. > > Sorry for the confusion, both readahead and readpage are called. > The ->readpage is called by vfs: vfs_fadvise. > (Full path) > read_pages This calls into squashfs_readpage(). The data pasted before is with SQUASHFS_DECOMP_SINGLE. However if using SQUASHFS_DECOMP_MULTI_PERCPU config: - 5.10: 1. real 0m1.692s, sys 0m4.188s 2. real 0m1.655s, sys 0m4.175s - 5.10 with c1f6925e1091 reverted: 1. real 0m1.549s, 0m3.616s 2. real 0m1.603s, 0m3.638s which is slightly better but the difference is not that much as using SQUASHFS_DECOMP_SINGLE. > page_cache_ra_unbounded > do_page_cache_ra > force_page_cache_ra > generic_fadvise > vfs_fadvise > ksys_readahead > __arm64_compat_sys_aarch32_readahead > el0_svc_common > do_el0_svc_compat > el0_svc_compat > el0_sync_compat_handler > el0_sync_compat > > The ->readahead is called by ext4: ext4_file_read_iter. But this part is fast. > (Full path) > read_pages This calls into ext4_readahead(). > page_cache_ra_unbounded > do_page_cache_ra > ondemand_readahead > page_cache_sync_ra > generic_file_buffered_read > generic_file_read_iter > ext4_file_read_iter > do_iter_readv_writev > do_iter_read > vfs_iter_read > loop_queue_work > kthread_worker_fn > loop_kthread_worker_fn > kthread > ret_from_fork