Dear Neil, Thank you for your clue. I make deeper trace after that and found another suspicion point. Kernel maybe wakeup a mdk_thread which was kfree in failed case of mddev->pers->run(mddev). For make sure my guess, I add some printk in order to get more info. It seems that md_seq_show try to wake up a mdk_thread in mddev_unlock() before md_seq_show return, but the thread was kfree already by "out_free_conf" case of run() function in drivers/md/raid10.c. The Oops report as attached. ----------------- Patch of my debug code --------------------------- Index: drivers/md/md.c =================================================================== --- drivers/md/md.c (revision 1911) +++ drivers/md/md.c (working copy) @@ -675,6 +675,8 @@ } else mutex_unlock(&mddev->reconfig_mutex); + if(mddev->thread) + printk("%s:%d Try to md_wakeup_thread(%p) of dev(%p)\n", __func__, __LINE__, mddev->thread, mddev); md_wakeup_thread(mddev->thread); } @@ -4531,7 +4533,7 @@ err = mddev->pers->run(mddev); if (err) - printk(KERN_ERR "md: pers->run() failed ...\n"); + printk(KERN_ERR "md: mddev(%p) pers->run() failed ...\n", mddev); else if (mddev->pers->size(mddev, 0, 0) < mddev->array_sectors) { WARN_ONCE(!mddev->external_size, "%s: default size too small," " but 'external_size' not in effect?\n", __func__); @@ -6451,6 +6453,7 @@ seq_printf(seq, "\n"); } + printk("%s:%d Try to mddev_unlock(%p)\n", __func__, __LINE__, mddev); mddev_unlock(mddev); return 0; Index: drivers/md/raid10.c =================================================================== --- drivers/md/raid10.c (revision 1911) +++ drivers/md/raid10.c (working copy) @@ -2398,7 +2398,9 @@ return 0; out_free_conf: + printk("%s:%d Before md_unregister_thread(%p), Queue=%p\n", __func__, __LINE__, mddev->thread, mddev->thread?&mddev->thread->wqueue:NULL); md_unregister_thread(mddev->thread); + printk("%s:%d md_unregister_thread(%p) done.\n", __func__, __LINE__, mddev->thread); if (conf->r10bio_pool) mempool_destroy(conf->r10bio_pool); safe_put_page(conf->tmppage); -------------------------------------------------------------------- Best Regards, Moore(莫谋鑫) -----Original Message----- From: NeilBrown [mailto:neilb@xxxxxxx] Sent: 2013年4月29日 9:40 To: Mo, Moore Cc: jbrassow@xxxxxxxxxx Subject: Re: Kernel crash at md_seq_show of drivers/md/md.c Hi, thanks for the report. For future reference, it is best to post questions like this to linux-raid@xxxxxxxxxxxxxxx as listed in the MAINTAINERS file. I couldn't find an "Oops" output attached.... (see below) On Thu, 25 Apr 2013 11:05:45 +0800 "Mo, Moore" <Moore.Mo@xxxxxxxxxxx> wrote: > Dear Neil & Jonathan, > > Sorry for disturb you. I got you mail info from git HEAD. I think you are the right person for solve a drivers/md crash issue which I met. > Currently, I met a NULL pointer kernel Oops during exception coverage test. I intended make an existed RAID volume under “Fail” status via BIOS SATA OROM Utility, then kernel(2.6.37.6 x86_64) crashed during try to “mdadm -l10 …” in /etc/rc3.d. The enclosed is log of Oops output. > > I through code path follow the backtrace of Oops and found sth. question. It seems that “md_seq_show” NOT handle NULL case which maybe return from md_seq_start. > // piece code of seq_read > ssize_t seq_read(struct file *file, char __user *buf, size_t size, > loff_t *ppos) { > struct seq_file *m = file->private_data; > size_t copied = 0; > loff_t pos; > > …………. > > /* we need at least one record in buffer */ > pos = m->index; > p = m->op->start(m, &pos); // md_seq_show maybe will return NULL. > while (1) { > err = PTR_ERR(p); > if (!p || IS_ERR(p)) > break; If md_seq_show returned NULL, then the above "if" would notice that "!p" is true, and would break out of the loop. So the following line will never get executed with 'p == NULL'. So I don't know what the problem might be. Maybe if you can post the Oops report (or linux-raid - don't send any HTML, just plain text) I might be able to see what is happening. NeilBrown > err = m->op->show(m, p); > if (err < 0) > break; > if (unlikely(err)) > m->count = 0; >
Attachment:
RAID_crash.log
Description: RAID_crash.log