Hi there I fear one of our mainboards did not play nicely with our SSDs in RAID1 configuration: mdadm --detail /dev/md2 /dev/md2: Version : 1.2 Creation Time : Fri Jul 27 11:58:50 2012 Raid Level : raid1 Array Size : 250050533 (238.47 GiB 256.05 GB) Used Dev Size : 250050533 (238.47 GiB 256.05 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Sat Aug 10 14:58:30 2013 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 Number Major Minor RaidDevice State 0 8 49 0 active sync /dev/sdd1 1 0 0 1 removed 1 8 33 - faulty spare /dev/sdc1 It seems both drives experienced some problem at around the same time, sdc was taken offline first, but then sdd also had problems (see log at the end of the email). The filesystem on top of it (ext4) of course had no way of coping with this problem, other than going to read/only. The big questions of course are (a) how to retrieve as much data as possible from the disks (b) how to revive the raid system again Any thoughts of what I should try first? I think to tackle (a) I'll use ddrescue first, just trying to cover a possible mistake I make later on Cheers Carsten Here's the start of the log: Aug 10 14:57:30 gitmaster kernel: [10731321.352291] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen Aug 10 14:57:30 gitmaster kernel: [10731321.352350] ata3.00: failed command: WRITE FPDMA QUEUED Aug 10 14:57:30 gitmaster kernel: [10731321.352380] ata3.00: cmd 61/02:00:47:00:00/00:00:00:00:00/40 tag 0 ncq 1024 out Aug 10 14:57:30 gitmaster kernel: [10731321.352380] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Aug 10 14:57:30 gitmaster kernel: [10731321.352469] ata3.00: status: { DRDY } Aug 10 14:57:30 gitmaster kernel: [10731321.352495] ata3: hard resetting link Aug 10 14:57:30 gitmaster kernel: [10731321.352528] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen Aug 10 14:57:30 gitmaster kernel: [10731321.352574] ata4.00: failed command: WRITE FPDMA QUEUED Aug 10 14:57:30 gitmaster kernel: [10731321.352604] ata4.00: cmd 61/02:00:47:00:00/00:00:00:00:00/40 tag 0 ncq 1024 out Aug 10 14:57:30 gitmaster kernel: [10731321.352605] res 40/00:00:47:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Aug 10 14:57:30 gitmaster kernel: [10731321.352695] ata4.00: status: { DRDY } Aug 10 14:57:30 gitmaster kernel: [10731321.352721] ata4: hard resetting link Aug 10 14:57:35 gitmaster kernel: [10731326.709171] ata3: link is slow to respond, please be patient (ready=0) Aug 10 14:57:35 gitmaster kernel: [10731326.721137] ata4: link is slow to respond, please be patient (ready=0) Aug 10 14:57:40 gitmaster kernel: [10731331.354487] ata3: COMRESET failed (errno=-16) Aug 10 14:57:40 gitmaster kernel: [10731331.354518] ata3: hard resetting link Aug 10 14:57:40 gitmaster kernel: [10731331.370448] ata4: COMRESET failed (errno=-16) Aug 10 14:57:40 gitmaster kernel: [10731331.370480] ata4: hard resetting link Aug 10 14:57:45 gitmaster kernel: [10731336.715383] ata3: link is slow to respond, please be patient (ready=0) Aug 10 14:57:45 gitmaster kernel: [10731336.735346] ata4: link is slow to respond, please be patient (ready=0) Aug 10 14:57:50 gitmaster kernel: [10731341.360692] ata3: COMRESET failed (errno=-16) Aug 10 14:57:50 gitmaster kernel: [10731341.360723] ata3: hard resetting link Aug 10 14:57:50 gitmaster kernel: [10731341.388654] ata4: COMRESET failed (errno=-16) Aug 10 14:57:50 gitmaster kernel: [10731341.388686] ata4: hard resetting link Aug 10 14:57:55 gitmaster kernel: [10731346.721587] ata3: link is slow to respond, please be patient (ready=0) Aug 10 14:57:55 gitmaster kernel: [10731346.749571] ata4: link is slow to respond, please be patient (ready=0) Aug 10 14:58:01 gitmaster /USR/SBIN/CRON[10885]: (root) CMD (cd /srv/gitorious && rake ultrasphinx:index RAILS_ENV=production > /dev/null 2>&1) Aug 10 14:58:25 gitmaster kernel: [10731376.344429] ata3: COMRESET failed (errno=-16) Aug 10 14:58:25 gitmaster kernel: [10731376.344464] ata3: limiting SATA link speed to 1.5 Gbps Aug 10 14:58:25 gitmaster kernel: [10731376.344497] ata3: hard resetting link Aug 10 14:58:25 gitmaster kernel: [10731376.424371] ata4: COMRESET failed (errno=-16) Aug 10 14:58:25 gitmaster kernel: [10731376.424403] ata4: limiting SATA link speed to 1.5 Gbps Aug 10 14:58:25 gitmaster kernel: [10731376.424436] ata4: hard resetting link Aug 10 14:58:30 gitmaster kernel: [10731381.365521] ata3: COMRESET failed (errno=-16) Aug 10 14:58:30 gitmaster kernel: [10731381.365554] ata3: reset failed, giving up Aug 10 14:58:30 gitmaster kernel: [10731381.365585] ata3.00: disabled Aug 10 14:58:30 gitmaster kernel: [10731381.365610] ata3.00: device reported invalid CHS sector 0 Aug 10 14:58:30 gitmaster kernel: [10731381.365643] ata3: EH complete Aug 10 14:58:30 gitmaster kernel: [10731381.365675] sd 2:0:0:0: [sdc] Unhandled error code Aug 10 14:58:30 gitmaster kernel: [10731381.365701] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Aug 10 14:58:30 gitmaster kernel: [10731381.365748] sd 2:0:0:0: [sdc] CDB: Write(10): 2a 00 00 00 00 47 00 00 02 00 Aug 10 14:58:30 gitmaster kernel: [10731381.365816] end_request: I/O error, dev sdc, sector 71 Aug 10 14:58:30 gitmaster kernel: [10731381.365844] end_request: I/O error, dev sdc, sector 71 Aug 10 14:58:30 gitmaster kernel: [10731381.365871] md: super_written gets error=-5, uptodate=0 Aug 10 14:58:30 gitmaster kernel: [10731381.365900] md/raid1:md2: Disk failure on sdc1, disabling device. Aug 10 14:58:30 gitmaster kernel: [10731381.365900] md/raid1:md2: Operation continuing on 1 devices. Aug 10 14:58:30 gitmaster kernel: [10731381.453474] ata4: COMRESET failed (errno=-16) Aug 10 14:58:30 gitmaster kernel: [10731381.453505] ata4: reset failed, giving up Aug 10 14:58:30 gitmaster kernel: [10731381.453536] ata4.00: disabled Aug 10 14:58:30 gitmaster kernel: [10731381.453565] ata4: EH complete Aug 10 14:58:30 gitmaster kernel: [10731381.453596] sd 3:0:0:0: [sdd] Unhandled error code Aug 10 14:58:30 gitmaster kernel: [10731381.453621] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Aug 10 14:58:30 gitmaster kernel: [10731381.453669] sd 3:0:0:0: [sdd] CDB: Write(10): 2a 00 00 00 00 47 00 00 02 00 Aug 10 14:58:30 gitmaster kernel: [10731381.453737] end_request: I/O error, dev sdd, sector 71 Aug 10 14:58:30 gitmaster kernel: [10731381.453765] end_request: I/O error, dev sdd, sector 71 Aug 10 14:58:30 gitmaster kernel: [10731381.453792] md: super_written gets error=-5, uptodate=0 Aug 10 14:58:30 gitmaster kernel: [10731381.453867] sd 3:0:0:0: [sdd] Unhandled error code Aug 10 14:58:30 gitmaster kernel: [10731381.453894] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Aug 10 14:58:30 gitmaster kernel: [10731381.453941] sd 3:0:0:0: [sdd] CDB: Write(10): 2a 00 00 00 00 47 00 00 02 00 Aug 10 14:58:30 gitmaster kernel: [10731381.454010] end_request: I/O error, dev sdd, sector 71 Aug 10 14:58:30 gitmaster kernel: [10731381.454036] end_request: I/O error, dev sdd, sector 71 Aug 10 14:58:30 gitmaster kernel: [10731381.454064] md: super_written gets error=-5, uptodate=0 Aug 10 14:58:30 gitmaster kernel: [10731381.454136] RAID1 conf printout: Aug 10 14:58:30 gitmaster kernel: [10731381.454140] --- wd:1 rd:2 Aug 10 14:58:30 gitmaster kernel: [10731381.454143] disk 0, wo:0, o:1, dev:sdd1 Aug 10 14:58:30 gitmaster kernel: [10731381.454146] disk 1, wo:1, o:0, dev:sdc1 Aug 10 14:58:30 gitmaster kernel: [10731381.477438] RAID1 conf printout: Aug 10 14:58:30 gitmaster kernel: [10731381.477442] --- wd:1 rd:2 Aug 10 14:58:30 gitmaster kernel: [10731381.477446] disk 0, wo:0, o:1, dev:sdd1 Aug 10 14:58:30 gitmaster kernel: [10731381.477477] sd 3:0:0:0: [sdd] Unhandled error code Aug 10 14:58:30 gitmaster kernel: [10731381.477514] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Aug 10 14:58:30 gitmaster kernel: [10731381.477562] sd 3:0:0:0: [sdd] CDB: Write(10): 2a 00 0e c7 da 6f 00 00 18 00 Aug 10 14:58:30 gitmaster kernel: [10731381.477630] end_request: I/O error, dev sdd, sector 247978607 Aug 10 14:58:30 gitmaster kernel: [10731381.477728] Aborting journal on device md2-8. Aug 10 14:58:30 gitmaster kernel: [10731381.477774] sd 3:0:0:0: [sdd] Unhandled error code Aug 10 14:58:30 gitmaster kernel: [10731381.477802] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Aug 10 14:58:30 gitmaster kernel: [10731381.477851] sd 3:0:0:0: [sdd] CDB: Write(10): 2a 00 0e c4 08 3f 00 00 08 00 Aug 10 14:58:30 gitmaster kernel: [10731381.477922] end_request: I/O error, dev sdd, sector 247728191 Aug 10 14:58:30 gitmaster kernel: [10731381.477944] sd 3:0:0:0: [sdd] Unhandled error code Aug 10 14:58:30 gitmaster kernel: [10731381.477945] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Aug 10 14:58:30 gitmaster kernel: [10731381.477947] sd 3:0:0:0: [sdd] CDB: Write(10): 2a 00 00 00 08 3f 00 00 08 00 Aug 10 14:58:30 gitmaster kernel: [10731381.477950] end_request: I/O error, dev sdd, sector 2111 Aug 10 14:58:30 gitmaster kernel: [10731381.477982] Buffer I/O error on device md2, logical block 0 Aug 10 14:58:30 gitmaster kernel: [10731381.477983] lost page write due to I/O error on md2 Aug 10 14:58:30 gitmaster kernel: [10731381.478011] EXT4-fs error (device md2): ext4_journal_start_sb:327: Detected aborted journal Aug 10 14:58:30 gitmaster kernel: [10731381.478013] EXT4-fs (md2): Remounting filesystem read-only Aug 10 14:58:30 gitmaster kernel: [10731381.478014] EXT4-fs (md2): previous I/O error to superblock detected Aug 10 14:58:30 gitmaster kernel: [10731381.478052] sd 3:0:0:0: [sdd] Unhandled error code Aug 10 14:58:30 gitmaster kernel: [10731381.478054] sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Aug 10 14:58:30 gitmaster kernel: [10731381.478055] sd 3:0:0:0: [sdd] CDB: Write(10): 2a 00 00 00 08 3f 00 00 08 00 Aug 10 14:58:30 gitmaster kernel: [10731381.478059] end_request: I/O error, dev sdd, sector 2111 Aug 10 14:58:30 gitmaster kernel: [10731381.478078] Buffer I/O error on device md2, logical block 0 Aug 10 14:58:30 gitmaster kernel: [10731381.478079] lost page write due to I/O error on md2 Aug 10 14:58:30 gitmaster kernel: [10731381.485182] Buffer I/O error on device md2, logical block 30965760 Aug 10 14:58:30 gitmaster kernel: [10731381.485184] lost page write due to I/O error on md2 Aug 10 14:58:30 gitmaster kernel: [10731381.485190] JBD2: I/O error detected when updating journal superblock for md2-8. Aug 10 14:58:30 gitmaster mdadm[1470]: Fail event detected on md device /dev/md/2, component device /dev/sdc1 -- Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics Callinstrasse 38, 30167 Hannover, Germany phone/fax: +49 511 762-17185 / -17193 https://wiki.atlas.aei.uni-hannover.de/foswiki/bin/view/ATLAS/WebHome
<<attachment: smime.p7s>>