On Wed, Jan 07, 2015 at 07:34:37AM +1100, David Raffelt wrote: > Hi Brian and Stefan, > Thanks for your reply. I checked the status of the array after the rebuild > (and before the reset). > > md0 : active raid6 sdd1[8] sdc1[4] sda1[3] sdb1[7] sdi1[5] sde1[1] > 14650667520 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/6] > [UUUUUU_] > > However given that I've never had any problems before with mdadm rebuilds I > did not think to check the data before rebooting. Note that the array is > still in this state. Before the reboot I tried to run a smartctl check on > the failed drives and it could not read them. When I rebooted I did not > actually replace any drives, I just power cycled to see if I could > re-access the drives that were thrown out of the array. According to > smartctl they are completely fine. > > I guess there is no way I can re-add the old drives and remove the newly > synced drive? Even though I immediately kicked all users off the system > when I got the mdadm alert, it's possible a small amount of data was > written to the array during the resync. > > It looks like the filesystem was not unmounted properly before reboot: > Jan 06 09:11:54 server systemd[1]: Failed unmounting /export/data. > Jan 06 09:11:54 server systemd[1]: Shutting down. > > Here is the mount errors in the log after rebooting: > Jan 06 09:15:17 server kernel: XFS (md0): Mounting Filesystem > Jan 06 09:15:17 server kernel: XFS (md0): Corruption detected. Unmount and > run xfs_repair > Jan 06 09:15:17 server kernel: XFS (md0): Corruption detected. Unmount and > run xfs_repair > Jan 06 09:15:17 server kernel: XFS (md0): Corruption detected. Unmount and > run xfs_repair > Jan 06 09:15:17 server kernel: XFS (md0): metadata I/O error: block 0x400 > ("xfs_trans_read_buf_map") error 117 numblks 16 > Jan 06 09:15:17 server kernel: XFS (md0): xfs_imap_to_bp: > xfs_trans_read_buf() returned error 117. > Jan 06 09:15:17 server kernel: XFS (md0): failed to read root inode > So it fails to read the root inode. You could also try to read said inode via xfs_db (e.g., 'sb,' 'p rootino,' 'inode <ino#>,' 'p') and see what it shows. Are you able to run xfs_metadump against the fs? If so and you're willing/able to make the dump available somewhere (compressed), I'd be interested to take a look to see what might be causing the difference in behavior between repair and xfs_db. Brian > xfs_repair -n -L also complains about a bad magic number. > > Unfortunately this 15TB RAID was part of a 45TB GlusterFS distributed > volume. It was only ever meant to be a scratch drive for intermediate > scientific results, however inevitably most users used it to store lots of > data. Oh well. > > Thanks again, > Dave > > > > > > > > > > > > > On 6 January 2015 at 23:47, Brian Foster <bfoster@xxxxxxxxxx> wrote: > > > On Tue, Jan 06, 2015 at 05:12:14PM +1100, David Raffelt wrote: > > > Hi again, > > > Some more information.... the kernel log show the following errors were > > > occurring after the RAID recovery, but before I reset the server. > > > > > > > By after the raid recovery, you mean after the two drives had failed out > > and 1 hot spare was activated and resync completed? It certainly seems > > like something went wrong in this process. The output below looks like > > it's failing to read in some inodes. Is there any stack trace output > > that accompanies these error messages to confirm? > > > > I suppose I would try to verify that the array configuration looks sane, > > but after the hot spare resync and then one or two other drive > > replacements (was the hot spare ultimately replaced?), it's hard to say > > whether it might be recoverable. > > > > Brian > > > > > Jan 06 00:00:27 server kernel: XFS (md0): Corruption detected. Unmount > > and > > > run xfs_repair > > > Jan 06 00:00:27 server kernel: XFS (md0): Corruption detected. Unmount > > and > > > run xfs_repair > > > Jan 06 00:00:27 server kernel: XFS (md0): Corruption detected. Unmount > > and > > > run xfs_repair > > > Jan 06 00:00:27 server kernel: XFS (md0): metadata I/O error: block > > > 0x36b106c00 ("xfs_trans_read_buf_map") error 117 numblks 16 > > > Jan 06 00:00:27 server kernel: XFS (md0): xfs_imap_to_bp: > > > xfs_trans_read_buf() returned error 117. > > > > > > > > > Thanks, > > > Dave > > > > > _______________________________________________ > > > xfs mailing list > > > xfs@xxxxxxxxxxx > > > http://oss.sgi.com/mailman/listinfo/xfs > > > > > > > -- > *David Raffelt (PhD)* > Postdoctoral Fellow > > The Florey Institute of Neuroscience and Mental Health > Melbourne Brain Centre - Austin Campus > 245 Burgundy Street > Heidelberg Vic 3084 > Ph: +61 3 9035 7024 > www.florey.edu.au > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs