On Fri, Dec 6, 2024 at 10:27 AM Oliver Sang <oliver.sang@xxxxxxxxx> wrote: > > hi, Yafang, > > On Tue, Dec 03, 2024 at 05:33:16PM +0800, Yafang Shao wrote: > > On Tue, Dec 3, 2024 at 11:04 AM Oliver Sang <oliver.sang@xxxxxxxxx> wrote: > > > > > > hi, Yafang, > > > > > > On Tue, Dec 03, 2024 at 10:14:50AM +0800, Yafang Shao wrote: > > > > On Fri, Nov 29, 2024 at 11:19 PM kernel test robot > > > > <oliver.sang@xxxxxxxxx> wrote: > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![usemem:#]" on: > > > > > > > > > > commit: 13da30d6f9150dff876f94a3f32d555e484ad04f ("mm/readahead: fix large folio support in async readahead") > > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > > > > > > > > > [test failed on linux-next/master cfba9f07a1d6aeca38f47f1f472cfb0ba133d341] > > > > > > > > > > in testcase: vm-scalability > > > > > version: vm-scalability-x86_64-6f4ef16-0_20241103 > > > > > with following parameters: > > > > > > > > > > runtime: 300s > > > > > test: mmap-xread-seq-mt > > > > > cpufreq_governor: performance > > > > > > > > > > > > > > > > > > > > config: x86_64-rhel-9.4 > > > > > compiler: gcc-12 > > > > > test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory > > > > > > > > > > (please refer to attached dmesg/kmsg for entire log/backtrace) > > > > > > > > > > > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > > > > the same patch/commit), kindly add following tags > > > > > | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> > > > > > | Closes: https://lore.kernel.org/oe-lkp/202411292300.61edbd37-lkp@xxxxxxxxx > > > > > > > > > > > > > > > > [...] > > > > > > > > > > > Is this issue consistently reproducible? > > > > I attempted to reproduce it using the mmap-xread-seq-mt test case but > > > > was unsuccessful. > > > > > > in our tests, the issue is quite persistent. as below, 100% reproduced in all > > > 8 runs, keeps clean on parent. > > > > > > d1aa0c04294e2988 13da30d6f9150dff876f94a3f32 > > > ---------------- --------------------------- > > > fail:runs %reproduction fail:runs > > > | | | > > > :8 100% 8:8 dmesg.BUG:soft_lockup-CPU##stuck_for#s![usemem:#] > > > :8 100% 8:8 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks > > > > > > to avoid any env issue, we rebuild kernel and rerun more to check. if still > > > consistently reproduced, we will follow your further requests. thanks > > > > Although I’ve made extensive attempts, I haven’t been able to > > reproduce the issue. My best guess is that, in the non-MADV_HUGEPAGE > > case, ra->size might be increasing to an unexpectedly large value. If > > that’s the case, I believe the issue can be resolved with the > > following additional change: > > > > diff --git a/mm/readahead.c b/mm/readahead.c > > index 9b8a48e736c6..e30132bc2593 100644 > > --- a/mm/readahead.c > > +++ b/mm/readahead.c > > @@ -385,8 +385,6 @@ static unsigned long get_next_ra_size(struct > > file_ra_state *ra, > > return 4 * cur; > > if (cur <= max / 2) > > return 2 * cur; > > - if (cur > max) > > - return cur; > > return max; > > } > > > > @@ -644,7 +642,11 @@ void page_cache_async_ra(struct readahead_control *ractl, > > 1UL << order); > > if (index == expected) { > > ra->start += ra->size; > > - ra->size = get_next_ra_size(ra, max_pages); > > + /* > > + * For the MADV_HUGEPAGE case, the ra->size might be larger than > > + * the max_pages. > > + */ > > + ra->size = max(ra->size, get_next_ra_size(ra, max_pages)); > > ra->async_size = ra->size; > > goto readit; > > } > > > > Could you please test this if you can consistently reproduce the bug? > > by this patch, we confirmed the issue gone on both platforms. > > Tested-by: kernel test robot <oliver.sang@xxxxxxxxx> Great! Thanks for your work. I'll send a new version. -- Regards Yafang