Hi Matthew, We tested that the performance of readahead is regressed on multicore arm64 platforms running on the 5.10 kernel. - The platform we used: 8 cores (4x a53(small), 4x a73(big)) arm64 platform - The command we used: ureadahead $FILE ($FILE is a 1MB+ pack file, note that if the file size is small, it's not obvious to see the regression) After we revert the commit c1f6925e1091("mm: put readahead pages in cache earlier"), the readahead performance is back: - time ureadahead $FILE: - 5.10: 1m23.124s - with c1f6925e1091 reverted: 0m3.323s - other LTS kernel (eg. 5.4): 0m3.066s The slowest part is aops->readpage() in read_pages() called in read_pages(ractl, &page_pool, false); (the 3rd in page_cache_ra_unbounded()) static void read_pages(struct readahead_control *rac, struct list_head *pages, bool skip_page) { ... if (aops->readahead) { ... } else if (aops->readpages) { ... } else { while ((page = readahead_page(rac))) { aops->readpage(rac->file, page); // most of the time is spent on this line put_page(page); } } ... } We also found following metrics that are relevant: - time ureadahead $FILE: - 5.10 - taskset ureadahead to a small core: 0m7.411s - taskset ureadahead to a big core: 0m5.982s compared to the original 1m23s, pining the ureadahead task on a single core also solves the gap. Do you have any idea why moving pages to cache earlier then doing page read later will cause such a difference? Thanks, Hsin-Yi