RE: Kernel crash at md_seq_show of drivers/md/md.c

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Wed, 15 May 2013 20:11:52 +0800 "Mo, Moore" <Moore.Mo@xxxxxxxxxxx> wrote:
> 
> > Dear Neil,
> > 
> >     Thank you for your clue. I make deeper trace after that and found another suspicion point. Kernel maybe wakeup a mdk_thread which was kfree in failed case of mddev->pers->run(mddev).
> > 
> >     For make sure my guess, I add some printk in order to get more info. It seems that md_seq_show try to wake up a mdk_thread in mddev_unlock() before md_seq_show return, but the thread was kfree already by "out_free_conf" case of run() function in drivers/md/raid10.c.
> > 
> >     The Oops report as attached.
> > 
> 
> I don't really have time for bug reports against ancient kernels - sorry.
> If you think you've found a bug, please at least look in the current mainline kernel to see if the code has changed, and preferably reproduce against the current mainline kernel.
> 
> In this case the bug was fixed by commit
>       01f96c0a9922cd9919baf9d16febdf7016177a12
> 18 months ago (linux 3.1).
> 

Thanks your information and sorry about that I haven't look through latest mainline kernel this time. I will pay attention to it in future.

I have merged the patch from linux 3.1 (with a little adaptation for 2.6.37), and verified on my issue theater. it works well. The Oops disappeared. Thank you.

Could you mind I ask an extended question: There is a NULL pointer protector in md_wakeup_thread. Why not set the thread pointer to NULL instant after md_unregister_thread in fail case of run? Just like as fail case of raid5 implement in 01f96c0a9922cd9919baf9d16febdf7016177a12~1. 

-------------------------------------------------
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 43709fa..ac5e8b5 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4941,8 +4941,7 @@ static int run(mddev_t *mddev)

        return 0;
 abort:
-       md_unregister_thread(mddev->thread);
-       mddev->thread = NULL;
+       md_unregister_thread(&mddev->thread);
        if (conf) {
                print_raid5_conf(conf);
                free_conf(conf);
-------------------------------------------------

I surmise the reason is: the run() procedure is running in the context which was pointed by mddev->thead itself. Is it right?
But I have tried this way before I wrote this mail (actually before merge the commit of 3.1). It seems also work well. The "wakeup thread oops" disappeared too, and no another oops or KERN_ERR print output.

Question just for my tech interest. Excuse me if it disturb you.



Best Regards,
Moore(莫谋鑫)
?韬{.n?????%??檩??w?{.n???{炳盯w???塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux