On Thu, 16 Jul 2015 20:14:21 +0200 Fabian Fischer <raid@xxxxxxxxxxxxxxxxx> wrote: > Hi, > today I had some problems with my mdadm raid5 (4disks). Firstly I try to > explaine what happened and what the result is: > > One disk in my array has some bad blocks. After some hardware-changes > one of the intact disks was thrown out of the array due to a faulty > sata-cable. > I shut down the server and replaced the cable. > After booting, the removed disk wasn't re added to the array (maybe > because of different event count). --re-add doesn't work. > So I used --add. You need a bitmap configured for --re-add to be useful. (It is generally a good idea anyway). > > Because of the bad blocks on one of the remaining disks, the rebuild > stops when reaching the first bad block. The re added disk is declared > as spare, 2 disks active and the disk with bad blocks as faulty. That shouldn't happen. Your devices all have a bad block log present, so when rebuilding hits a bad block it should just record that as a bad block on the recovering disk and keep going. That is the whole point of the bad block log. What kernel are you running? And what version of mdadm? > > /dev/md127: > Version : 1.2 > Creation Time : Tue Apr 19 08:51:32 2011 > Raid Level : raid5 > Array Size : 5860538880 (5589.05 GiB 6001.19 GB) > Used Dev Size : 1953512960 (1863.02 GiB 2000.40 GB) > Raid Devices : 4 > Total Devices : 4 > Persistence : Superblock is persistent > > Update Time : Thu Jul 16 19:02:09 2015 > State : clean, FAILED > Active Devices : 2 > Working Devices : 3 > Failed Devices : 1 > Spare Devices : 1 > > Layout : left-symmetric > Chunk Size : 512K > > Name : FiFa-Server:0 > UUID : 839fb405:d0b1f13a:5a55ee42:fc8a2061 > Events : 107223 > > Number Major Minor RaidDevice State > 0 0 0 0 removed > 1 8 80 1 active sync /dev/sdf > 5 8 32 2 active sync /dev/sdc > 6 0 0 6 removed > > 4 8 96 - faulty /dev/sdg > 6 8 64 - spare /dev/sde > > > In my opinion there a 3 possibilities to get the array back working. I > am not sure whether both possibilities really exist and which one is the > most promising. > - Using the 'spare'-disk as active disk. The data on the disk > should be still there. > - Ignoring the bad blocks and loose information stored in this > blocks > - force start the array without the 'spare' disk and copy the > data to backup-storage, or does the bad block will cause the > array to fail when reaching a bad block? If you have somewhere to store backed up data, and if you can assemble the array with "--assemble --force", then taking that approach and copying all the data to somewhere else is the safest option. To do anything else would require a clear understanding of the history of the array. Maybe re-creating the array using the 3 "best" devices would help, but you would want to be really sure what you were doing. The data in the record bad blocks is probably lost already anyway - hopefully there is nothing critical there. Given the update times on the superblocks are very close the array is probably quite consistent. I think "--assemble --force" listing the three devices was "Device Role: Active device ..." should work and give you a degraded array. Then 'fsck' that and copy data off. Then maybe recreate the array from scratch. NeilBrown > > In the attachment you can find the output of --examine. > In can not explain why 3 disk have a Bad Block Log. According to > smart-values only sdg has Reallocated_Sector_Ct >0 > Another thing I can't explain is why sdg (which is the disk with known > bad blocks) has a lower event count. > > > I hope I can get some great ideas how to fix my array. > > Fabian > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html