hi, Yafang, On Tue, Dec 03, 2024 at 05:33:16PM +0800, Yafang Shao wrote: > On Tue, Dec 3, 2024 at 11:04 AM Oliver Sang <oliver.sang@xxxxxxxxx> wrote: > > > > hi, Yafang, > > > > On Tue, Dec 03, 2024 at 10:14:50AM +0800, Yafang Shao wrote: > > > On Fri, Nov 29, 2024 at 11:19 PM kernel test robot > > > <oliver.sang@xxxxxxxxx> wrote: > > > > > > > > > > > > > > > > Hello, > > > > > > > > kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![usemem:#]" on: > > > > > > > > commit: 13da30d6f9150dff876f94a3f32d555e484ad04f ("mm/readahead: fix large folio support in async readahead") > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > > > > > > > [test failed on linux-next/master cfba9f07a1d6aeca38f47f1f472cfb0ba133d341] > > > > > > > > in testcase: vm-scalability > > > > version: vm-scalability-x86_64-6f4ef16-0_20241103 > > > > with following parameters: > > > > > > > > runtime: 300s > > > > test: mmap-xread-seq-mt > > > > cpufreq_governor: performance > > > > > > > > > > > > > > > > config: x86_64-rhel-9.4 > > > > compiler: gcc-12 > > > > test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory > > > > > > > > (please refer to attached dmesg/kmsg for entire log/backtrace) > > > > > > > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > > > the same patch/commit), kindly add following tags > > > > | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> > > > > | Closes: https://lore.kernel.org/oe-lkp/202411292300.61edbd37-lkp@xxxxxxxxx > > > > > > > > > > > > [...] > > > > > > > > Is this issue consistently reproducible? > > > I attempted to reproduce it using the mmap-xread-seq-mt test case but > > > was unsuccessful. > > > > in our tests, the issue is quite persistent. as below, 100% reproduced in all > > 8 runs, keeps clean on parent. > > > > d1aa0c04294e2988 13da30d6f9150dff876f94a3f32 > > ---------------- --------------------------- > > fail:runs %reproduction fail:runs > > | | | > > :8 100% 8:8 dmesg.BUG:soft_lockup-CPU##stuck_for#s![usemem:#] > > :8 100% 8:8 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks > > > > to avoid any env issue, we rebuild kernel and rerun more to check. if still > > consistently reproduced, we will follow your further requests. thanks > > Although I’ve made extensive attempts, I haven’t been able to > reproduce the issue. My best guess is that, in the non-MADV_HUGEPAGE > case, ra->size might be increasing to an unexpectedly large value. If > that’s the case, I believe the issue can be resolved with the > following additional change: > > diff --git a/mm/readahead.c b/mm/readahead.c > index 9b8a48e736c6..e30132bc2593 100644 > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -385,8 +385,6 @@ static unsigned long get_next_ra_size(struct > file_ra_state *ra, > return 4 * cur; > if (cur <= max / 2) > return 2 * cur; > - if (cur > max) > - return cur; > return max; > } > > @@ -644,7 +642,11 @@ void page_cache_async_ra(struct readahead_control *ractl, > 1UL << order); > if (index == expected) { > ra->start += ra->size; > - ra->size = get_next_ra_size(ra, max_pages); > + /* > + * For the MADV_HUGEPAGE case, the ra->size might be larger than > + * the max_pages. > + */ > + ra->size = max(ra->size, get_next_ra_size(ra, max_pages)); > ra->async_size = ra->size; > goto readit; > } > > Could you please test this if you can consistently reproduce the bug? by this patch, we confirmed the issue gone on both platforms. Tested-by: kernel test robot <oliver.sang@xxxxxxxxx> below d18114f8dcb33d7ed6216673903 is just your patch on Cooper Lake in our original report d1aa0c04294e2988 13da30d6f9150dff876f94a3f32 d18114f8dcb33d7ed6216673903 ---------------- --------------------------- --------------------------- fail:runs %reproduction fail:runs %reproduction fail:runs | | | | | :20 75% 15:20 0% :20 dmesg.BUG:soft_lockup-CPU##stuck_for#s![usemem:#] :20 75% 15:20 0% :20 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks on another Ice Lake platform d1aa0c04294e2988 13da30d6f9150dff876f94a3f32 d18114f8dcb33d7ed6216673903 ---------------- --------------------------- --------------------------- fail:runs %reproduction fail:runs %reproduction fail:runs | | | | | :10 50% 5:10 0% :20 dmesg.BUG:soft_lockup-CPU##stuck_for#s![usemem:#] :10 50% 5:10 0% :20 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks > > -- > Regards > Yafang