RE: Need help recovering RAID5 array

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: NeilBrown [mailto:neilb@xxxxxxx]
> Sent: Monday, August 08, 2011 10:56 PM
> To: Muskiewicz, Stephen C
> Cc: linux-raid@xxxxxxxxxxxxxxx
> Subject: Re: Need help recovering RAID5 array
> 
> >
> > This does lead to a question: Do you recommend (and is it safe on
> CentOS
> > 5.5?) for me to use the updated (3.2.2 with your patch) version of
> mdadm
> > going forward in place of the CentOS version (2.6.9)?
> 
> I wouldn't kept that patch.  It was a little hack to get your array
> working
> again.  I wouldn't recommend using it without expert advice...
> 
> Other than that ... 3.2.2 certainly fixes bug and adds features over
> 2.6.9,
> but maybe it adds some bugs too...  I would say that it is safe, but
> probably
> not really necessary.
> i.e. up to you :-)
> 

OK, I'll probably stick with 2.6.9 for now and focus on getting our other thumper server updated to CentOS 6 then.  Oh yeah and getting the UPS control software so it actually shuts down the box cleanly so this hopefully doesn't happen again! ;-)

> >
> > > I wonder how the event count got that high.  There aren't enough
> seconds
> > > since the birth of the universe of it to have happened naturally...
> > >
> > Any chance it might be related to these kernel messages? I just
> noticed
> > (guess I should be paying more attention to my logs) that there are
> tons
> > of these messages repeated in my /var/log/messages file.  However as
> far
> > as the RAID arrays themselves, we haven't seen any problems while
> they
> > are running so I'm not sure what's causing these or whether they are
> > insignificant.  Again, speculation on my part but given the huge
> event
> > count from mdadm and the number of these messages it might seem that
> > they are somehow related....
> >
> > Jul 31 04:02:13 libthumper1 kernel: program diskmond is using a
> > deprecated SCSI
> > ioctl, please convert it to SG_IO
> > Jul 31 04:02:26 libthumper1 last message repeated 47 times
> > Jul 31 04:12:11 libthumper1 kernel: md: bug in file drivers/md/md.c,
> > line 1659
> 
> I need to know the exact kernel version to find out what this line
> is.... I
> could guess but I would probably be wrong.
> 
> > Jul 31 04:12:11 libthumper1 kernel:
> > Jul 31 04:12:11 libthumper1 kernel: md:
> **********************************
> > Jul 31 04:12:11 libthumper1 kernel: md: * <COMPLETE RAID STATE
> PRINTOUT> *
> > Jul 31 04:12:11 libthumper1 kernel: md:
> **********************************
> > Jul 31 04:12:11 libthumper1 kernel: md53:
> > <sdk1><sdai1><sds1><sdam1><sdo1><sdau1><sdaq1><sdw1><sdaa1><sdae1>
> > Jul 31 04:12:11 libthumper1 kernel: md: rdev sdk1, SZ:488383744 F:0
> S:1
> > DN:10
> > Jul 31 04:12:11 libthumper1 kernel: md: rdev superblock:
> > Jul 31 04:12:11 libthumper1 kernel: md:  SB: (V:1.0.0)
> > ID:<be475f67.00000000.00000000.00000000> CT:81f4e22f
> > Jul 31 04:12:11 libthumper1 kernel: md:     L-2009873429 S1801675106
> > ND:1834971253 RD:1869771369 md114 LO:65536 CS:196610
> > Jul 31 04:12:11 libthumper1 kernel: md:     UT:00000000 ST:0
> > AD:976767728 WD:0 FD:976767984 SD:0 CSUM:00000000 E:00000000
> > Jul 31 04:12:11 libthumper1 kernel:      D  0:  DISK<N:-1,(-1,-1),R:-
> 1,S:-1>
> > Jul 31 04:12:11 libthumper1 kernel:      D  1:  DISK<N:-1,(-1,-1),R:-
> 1,S:-1>
> > Jul 31 04:12:11 libthumper1 kernel:      D  2:  DISK<N:-1,(-1,-1),R:-
> 1,S:-1>
> > Jul 31 04:12:11 libthumper1 kernel:      D  3:  DISK<N:-1,(-1,-1),R:-
> 1,S:-1>
> > Jul 31 04:12:11 libthumper1 kernel: md:     THIS:
> DISK<N:0,(0,0),R:0,S:0>
> > Jul 31 04:12:11 libthumper1 kernel: md: rdev superblock:
> > Jul 31 04:12:11 libthumper1 kernel: md:  SB: (V:1.0.0)
> > ID:<be475f67.00000000.00000000.00000000> CT:81f4e22f
> > Jul 31 04:12:11 libthumper1 kernel: md:     L-2009873429 S1801675106
> > ND:1834971253 RD:1869771369 md114 LO:65536 CS:196610
> > Jul 31 04:12:11 libthumper1 kernel: md:     UT:00000000 ST:0
> > AD:976767728 WD:0 FD:976767984 SD:0 CSUM:00000000 E:00000000
> >
> > <snip...and on and on>
> 
> Did it really start repeating at this point?  I would have expected a
> bit
> more first.
> 
> So if you get me kernel version and confirm that this really is all in
> the
> logs except for identical repeats, I'll see if I can figure out what
> might
> have caused it - and then if it could be related to your original
> problem.
> 

Yes you're right, there is quite a bit more of the info in the logs in between the "bug in file ... line 1659" message.  It looks to be a state dump for each device in the array.  I'll save the bandwidth and not paste all of that in here unless you need it.  But I have confirmed that all of the bug lines are for the same line number (approx 60000 occurrences in the old backup of the messages file alone):

libthumper1 kernel: md: bug in file drivers/md/md.c, line 1659

Here's the kernel version and RPM info:

[root@libthumper1 ~]# uname -a
Linux libthumper1.uml.edu 2.6.18-194.32.1.el5 #1 SMP Wed Jan 5 17:52:25 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

[root@libthumper1 ~]# rpm -qi kernel-2.6.18-194.32.1.el5
Name        : kernel                       Relocations: (not relocatable)
Version     : 2.6.18                            Vendor: CentOS
Release     : 194.32.1.el5                  Build Date: Wed 05 Jan 2011 08:44:05 PM EST
Install Date: Tue 25 Jan 2011 03:13:55 PM EST      Build Host: builder10.centos.org
Group       : System Environment/Kernel     Source RPM: kernel-2.6.18-194.32.1.el5.src.rpm
Size        : 96513754                         License: GPLv2
Signature   : DSA/SHA1, Thu 06 Jan 2011 07:16:03 AM EST, Key ID a8a447dce8562897
URL         : http://www.kernel.org/
<snip>

Let me know if I can provide any other useful info.

Again, many thanks for all your help!

Cheers,
-steve



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux