hi, Yafang, On Tue, Dec 03, 2024 at 05:33:16PM +0800, Yafang Shao wrote: > On Tue, Dec 3, 2024 at 11:04 AM Oliver Sang <oliver.sang@xxxxxxxxx> wrote: > > > > hi, Yafang, > > > > On Tue, Dec 03, 2024 at 10:14:50AM +0800, Yafang Shao wrote: > > > On Fri, Nov 29, 2024 at 11:19 PM kernel test robot > > > <oliver.sang@xxxxxxxxx> wrote: > > > > > > > > > > > > > > > > Hello, > > > > > > > > kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![usemem:#]" on: > > > > > > > > commit: 13da30d6f9150dff876f94a3f32d555e484ad04f ("mm/readahead: fix large folio support in async readahead") > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > > > > > > > [test failed on linux-next/master cfba9f07a1d6aeca38f47f1f472cfb0ba133d341] > > > > > > > > in testcase: vm-scalability > > > > version: vm-scalability-x86_64-6f4ef16-0_20241103 > > > > with following parameters: > > > > > > > > runtime: 300s > > > > test: mmap-xread-seq-mt > > > > cpufreq_governor: performance > > > > > > > > > > > > > > > > config: x86_64-rhel-9.4 > > > > compiler: gcc-12 > > > > test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory > > > > > > > > (please refer to attached dmesg/kmsg for entire log/backtrace) > > > > > > > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > > > the same patch/commit), kindly add following tags > > > > | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> > > > > | Closes: https://lore.kernel.org/oe-lkp/202411292300.61edbd37-lkp@xxxxxxxxx > > > > > > > > > > > > [...] > > > > > > > > Is this issue consistently reproducible? > > > I attempted to reproduce it using the mmap-xread-seq-mt test case but > > > was unsuccessful. > > > > in our tests, the issue is quite persistent. as below, 100% reproduced in all > > 8 runs, keeps clean on parent. > > > > d1aa0c04294e2988 13da30d6f9150dff876f94a3f32 > > ---------------- --------------------------- > > fail:runs %reproduction fail:runs > > | | | > > :8 100% 8:8 dmesg.BUG:soft_lockup-CPU##stuck_for#s![usemem:#] > > :8 100% 8:8 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks > > > > to avoid any env issue, we rebuild kernel and rerun more to check. if still > > consistently reproduced, we will follow your further requests. thanks > > Although I’ve made extensive attempts, I haven’t been able to > reproduce the issue. My best guess is that, in the non-MADV_HUGEPAGE > case, ra->size might be increasing to an unexpectedly large value. If > that’s the case, I believe the issue can be resolved with the > following additional change: sorry that our service runs into some problems these two days and we are busy fixing them, I cannot address your request quickly. here is a quick update. we rebuild kernel the rerun tests more, issue seems still persistent. d1aa0c04294e2988 13da30d6f9150dff876f94a3f32 ---------------- --------------------------- fail:runs %reproduction fail:runs | | | :20 75% 15:20 dmesg.BUG:soft_lockup-CPU##stuck_for#s![usemem:#] :20 75% 15:20 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks in order to remove the possibility of env issues on this machine, we tried same tests on another Ice Lake platform, still see the similar issues, though the rate seems a little lower. d1aa0c04294e2988 13da30d6f9150dff876f94a3f32 ---------------- --------------------------- fail:runs %reproduction fail:runs | | | :10 50% 5:10 dmesg.BUG:soft_lockup-CPU##stuck_for#s![usemem:#] :10 50% 5:10 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks we will test your below patch and update you the results. thanks. > > diff --git a/mm/readahead.c b/mm/readahead.c > index 9b8a48e736c6..e30132bc2593 100644 > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -385,8 +385,6 @@ static unsigned long get_next_ra_size(struct > file_ra_state *ra, > return 4 * cur; > if (cur <= max / 2) > return 2 * cur; > - if (cur > max) > - return cur; > return max; > } > > @@ -644,7 +642,11 @@ void page_cache_async_ra(struct readahead_control *ractl, > 1UL << order); > if (index == expected) { > ra->start += ra->size; > - ra->size = get_next_ra_size(ra, max_pages); > + /* > + * For the MADV_HUGEPAGE case, the ra->size might be larger than > + * the max_pages. > + */ > + ra->size = max(ra->size, get_next_ra_size(ra, max_pages)); > ra->async_size = ra->size; > goto readit; > } > > Could you please test this if you can consistently reproduce the bug? > > -- > Regards > Yafang