> On Wed, 15 May 2013 20:11:52 +0800 "Mo, Moore" <Moore.Mo@xxxxxxxxxxx> wrote: > > > Dear Neil, > > > > Thank you for your clue. I make deeper trace after that and found another suspicion point. Kernel maybe wakeup a mdk_thread which was kfree in failed case of mddev->pers->run(mddev). > > > > For make sure my guess, I add some printk in order to get more info. It seems that md_seq_show try to wake up a mdk_thread in mddev_unlock() before md_seq_show return, but the thread was kfree already by "out_free_conf" case of run() function in drivers/md/raid10.c. > > > > The Oops report as attached. > > > > I don't really have time for bug reports against ancient kernels - sorry. > If you think you've found a bug, please at least look in the current mainline kernel to see if the code has changed, and preferably reproduce against the current mainline kernel. > > In this case the bug was fixed by commit > 01f96c0a9922cd9919baf9d16febdf7016177a12 > 18 months ago (linux 3.1). > Thanks your information and sorry about that I haven't look through latest mainline kernel this time. I will pay attention to it in future. I have merged the patch from linux 3.1 (with a little adaptation for 2.6.37), and verified on my issue theater. it works well. The Oops disappeared. Thank you. Could you mind I ask an extended question: There is a NULL pointer protector in md_wakeup_thread. Why not set the thread pointer to NULL instant after md_unregister_thread in fail case of run? Just like as fail case of raid5 implement in 01f96c0a9922cd9919baf9d16febdf7016177a12~1. ------------------------------------------------- diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 43709fa..ac5e8b5 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -4941,8 +4941,7 @@ static int run(mddev_t *mddev) return 0; abort: - md_unregister_thread(mddev->thread); - mddev->thread = NULL; + md_unregister_thread(&mddev->thread); if (conf) { print_raid5_conf(conf); free_conf(conf); ------------------------------------------------- I surmise the reason is: the run() procedure is running in the context which was pointed by mddev->thead itself. Is it right? But I have tried this way before I wrote this mail (actually before merge the commit of 3.1). It seems also work well. The "wakeup thread oops" disappeared too, and no another oops or KERN_ERR print output. Question just for my tech interest. Excuse me if it disturb you. Best Regards, Moore(莫谋鑫) ?韬{.n?????%??檩??w?{.n???{炳盯w???塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f