Re: [linux-next:master] [mm/readahead] 13da30d6f9: BUG:soft_lockup-CPU##stuck_for#s![usemem:#]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 3, 2024 at 11:04 AM Oliver Sang <oliver.sang@xxxxxxxxx> wrote:
>
> hi, Yafang,
>
> On Tue, Dec 03, 2024 at 10:14:50AM +0800, Yafang Shao wrote:
> > On Fri, Nov 29, 2024 at 11:19 PM kernel test robot
> > <oliver.sang@xxxxxxxxx> wrote:
> > >
> > >
> > >
> > > Hello,
> > >
> > > kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![usemem:#]" on:
> > >
> > > commit: 13da30d6f9150dff876f94a3f32d555e484ad04f ("mm/readahead: fix large folio support in async readahead")
> > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > >
> > > [test failed on linux-next/master cfba9f07a1d6aeca38f47f1f472cfb0ba133d341]
> > >
> > > in testcase: vm-scalability
> > > version: vm-scalability-x86_64-6f4ef16-0_20241103
> > > with following parameters:
> > >
> > >         runtime: 300s
> > >         test: mmap-xread-seq-mt
> > >         cpufreq_governor: performance
> > >
> > >
> > >
> > > config: x86_64-rhel-9.4
> > > compiler: gcc-12
> > > test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
> > >
> > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > >
> > >
> > >
> > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > the same patch/commit), kindly add following tags
> > > | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> > > | Closes: https://lore.kernel.org/oe-lkp/202411292300.61edbd37-lkp@xxxxxxxxx
> > >
> > >
>
> [...]
>
> >
> > Is this issue consistently reproducible?
> > I attempted to reproduce it using the mmap-xread-seq-mt test case but
> > was unsuccessful.
>
> in our tests, the issue is quite persistent. as below, 100% reproduced in all
> 8 runs, keeps clean on parent.
>
> d1aa0c04294e2988 13da30d6f9150dff876f94a3f32
> ---------------- ---------------------------
>        fail:runs  %reproduction    fail:runs
>            |             |             |
>            :8          100%           8:8     dmesg.BUG:soft_lockup-CPU##stuck_for#s![usemem:#]
>            :8          100%           8:8     dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks
>
> to avoid any env issue, we rebuild kernel and rerun more to check. if still
> consistently reproduced, we will follow your further requests. thanks

Although I’ve made extensive attempts, I haven’t been able to
reproduce the issue. My best guess is that, in the non-MADV_HUGEPAGE
case, ra->size might be increasing to an unexpectedly large value. If
that’s the case, I believe the issue can be resolved with the
following additional change:

diff --git a/mm/readahead.c b/mm/readahead.c
index 9b8a48e736c6..e30132bc2593 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -385,8 +385,6 @@ static unsigned long get_next_ra_size(struct
file_ra_state *ra,
                return 4 * cur;
        if (cur <= max / 2)
                return 2 * cur;
-       if (cur > max)
-               return cur;
        return max;
 }

@@ -644,7 +642,11 @@ void page_cache_async_ra(struct readahead_control *ractl,
                        1UL << order);
        if (index == expected) {
                ra->start += ra->size;
-               ra->size = get_next_ra_size(ra, max_pages);
+               /*
+                * For the MADV_HUGEPAGE case, the ra->size might be larger than
+                * the max_pages.
+                */
+               ra->size = max(ra->size, get_next_ra_size(ra, max_pages));
                ra->async_size = ra->size;
                goto readit;
        }

Could you please test this if you can consistently reproduce the bug?

--
Regards
Yafang





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux