On Tue, Dec 3, 2024 at 11:04 AM Oliver Sang <oliver.sang@xxxxxxxxx> wrote: > > hi, Yafang, > > On Tue, Dec 03, 2024 at 10:14:50AM +0800, Yafang Shao wrote: > > On Fri, Nov 29, 2024 at 11:19 PM kernel test robot > > <oliver.sang@xxxxxxxxx> wrote: > > > > > > > > > > > > Hello, > > > > > > kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![usemem:#]" on: > > > > > > commit: 13da30d6f9150dff876f94a3f32d555e484ad04f ("mm/readahead: fix large folio support in async readahead") > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > > > > > [test failed on linux-next/master cfba9f07a1d6aeca38f47f1f472cfb0ba133d341] > > > > > > in testcase: vm-scalability > > > version: vm-scalability-x86_64-6f4ef16-0_20241103 > > > with following parameters: > > > > > > runtime: 300s > > > test: mmap-xread-seq-mt > > > cpufreq_governor: performance > > > > > > > > > > > > config: x86_64-rhel-9.4 > > > compiler: gcc-12 > > > test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory > > > > > > (please refer to attached dmesg/kmsg for entire log/backtrace) > > > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > > the same patch/commit), kindly add following tags > > > | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> > > > | Closes: https://lore.kernel.org/oe-lkp/202411292300.61edbd37-lkp@xxxxxxxxx > > > > > > > > [...] > > > > > Is this issue consistently reproducible? > > I attempted to reproduce it using the mmap-xread-seq-mt test case but > > was unsuccessful. > > in our tests, the issue is quite persistent. as below, 100% reproduced in all > 8 runs, keeps clean on parent. > > d1aa0c04294e2988 13da30d6f9150dff876f94a3f32 > ---------------- --------------------------- > fail:runs %reproduction fail:runs > | | | > :8 100% 8:8 dmesg.BUG:soft_lockup-CPU##stuck_for#s![usemem:#] > :8 100% 8:8 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks > > to avoid any env issue, we rebuild kernel and rerun more to check. if still > consistently reproduced, we will follow your further requests. thanks Although I’ve made extensive attempts, I haven’t been able to reproduce the issue. My best guess is that, in the non-MADV_HUGEPAGE case, ra->size might be increasing to an unexpectedly large value. If that’s the case, I believe the issue can be resolved with the following additional change: diff --git a/mm/readahead.c b/mm/readahead.c index 9b8a48e736c6..e30132bc2593 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -385,8 +385,6 @@ static unsigned long get_next_ra_size(struct file_ra_state *ra, return 4 * cur; if (cur <= max / 2) return 2 * cur; - if (cur > max) - return cur; return max; } @@ -644,7 +642,11 @@ void page_cache_async_ra(struct readahead_control *ractl, 1UL << order); if (index == expected) { ra->start += ra->size; - ra->size = get_next_ra_size(ra, max_pages); + /* + * For the MADV_HUGEPAGE case, the ra->size might be larger than + * the max_pages. + */ + ra->size = max(ra->size, get_next_ra_size(ra, max_pages)); ra->async_size = ra->size; goto readit; } Could you please test this if you can consistently reproduce the bug? -- Regards Yafang