Re: [Bug 216072] New: regression: ccccccgcdkgekhjervgbdfbhdjugcjkfdhiegeuugugtHang at boot when DAMON is enabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Cc-ing damon@xxxxxxxxxxxxxxx

Thank you for reporting this, Greg!  And thank you for forwarding this, Andrew!

On Sat, 4 Jun 2022 11:27:06 -0700 Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:

> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Sat, 04 Jun 2022 15:49:50 +0000 bugzilla-daemon@xxxxxxxxxx wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=216072
> > 
> >             Bug ID: 216072
> >            Summary: regression:
> >                     ccccccgcdkgekhjervgbdfbhdjugcjkfdhiegeuugugtHang at
> >                     boot when DAMON is enabled
> >            Product: Memory Management
> >            Version: 2.5
> >     Kernel Version: 5.19 pre-rc1
> >           Hardware: All
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >           Assignee: akpm@xxxxxxxxxxxxxxxxxxxx
> >           Reporter: gwhite@xxxxxxxxxxx
> >         Regression: No
> > 
> > I see a hang on boot whenever DAMON is enabled.  The specific commit that
> > causes this is listed below.  There is no printk / dmesg output, only the
> > message about an initrd being loaded by EFIStup.  Then a hard hang.  Removing
> > the commit below - or disabling DAMON entirely - fixes the issue.
> > 
> > commit 059342d1dd4e01d634184793fa3f8437e62afaa1
> > Author: Hailong Tu <tuhailong@xxxxxxxxx>
> > Date:   Fri Apr 29 14:37:00 2022 -0700
> > 
> >     mm/damon/reclaim: fix the timer always stays active
> > 
> >     The timer stays active even if the reclaim mechanism is never enabled.  It
> >     is unnecessary overhead can be completely avoided by using
> >     module_param_cb() for enabled flag.
> > 
> >     Link:
> > https://lkml.kernel.org/r/20220421125910.1052459-1-tuhailong@xxxxxxxxx
> >     Signed-off-by: Hailong Tu <tuhailong@xxxxxxxxx>
> >     Reviewed-by: SeongJae Park <sj@xxxxxxxxxx>
> >     Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>

Greg has further mentioned that the issue can be reproduced when
the kernel is booting with damon_reclaim.enabled=Y parameter, and I was also
reproducible on my test machine.

DAMON_RECLAIM calls 'schedule_delayed_work()', which uses 'system_wq', from a
parameter store callback ('enabled_store()'), which is called from
'parse_args()', which is again called from 'start_kernel()'.

And 'system_wq' is initialized from 'workqueue_init_early()', which is called
from 'start_kernel()' after 'parse_args()'.

Therefore the 'schedule_delayed_work()' touches the uninitialized 'system_wq',
and the init process gets kernel NULL pointer dereference, and the system
hangs.

I further confirmed below simple change fixes this issue.  I will format it as
a patch and send soon.

diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c
index 53c0c084f046..78984c8d1047 100644
--- a/mm/damon/reclaim.c
+++ b/mm/damon/reclaim.c
@@ -374,6 +374,8 @@ static void damon_reclaim_timer_fn(struct work_struct *work)
 }
 static DECLARE_DELAYED_WORK(damon_reclaim_timer, damon_reclaim_timer_fn);

+static bool damon_reclaim_initialized;
+
 static int enabled_store(const char *val,
                const struct kernel_param *kp)
 {
@@ -382,6 +384,9 @@ static int enabled_store(const char *val,
        if (rc < 0)
                return rc;

+       if (!damon_reclaim_initialized)
+               return rc;
+
        if (enabled)
                schedule_delayed_work(&damon_reclaim_timer, 0);

@@ -450,6 +455,8 @@ static int __init damon_reclaim_init(void)
        damon_add_target(ctx, target);

        schedule_delayed_work(&damon_reclaim_timer, 0);
+
+       damon_reclaim_initialized = true;
        return 0;
 }



Thanks,
SJ

[...]




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux