On Sat, Jan 22, 2011 at 07:41:18PM +0100, Stygge wrote: > On Fri, 21 Jan 2011 at 13:46, Arno Wagner wrote: > > > > The keyslots themselves do not fit. In principle, > > if there are no bad sectors in the keyslot-0 area, the > > keyslot should still be intact, but there is some reason > > the RAID manager kicked two disks and it seems your > > keyslot now _is_ corrupt. > > > > In fact I am a bit surprised. A failed RAID array should > > not come up again by itself without manual intervention, > > even if the disks work fine again. > > > > Ok, first an emergency backup to not make things worse: > > Do a LUKS header backup as described in the FAQ > > (http://code.google.com/p/cryptsetup/wiki/FrequentlyAskedQuestions) > > Great minds etc- - already did that :-) ;-) > > > > Then check the RAID status (cat /proc/mdstat). > > # cat /proc/mdstat > Personalities : [raid6] [raid5] [raid4] > md0 : active raid5 sda1[0] sdd1[3] sdc1[2] sdb1[1] > 2930279808 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] > > unused devices: <none> good > > Can you post the RAID info (or send it directly to me? > > Command is mdadm -D /dev/md<nr> > > # mdadm -D /dev/md0 > /dev/md0: > Version : 0.90 > Creation Time : Wed Jan 19 22:16:47 2011 That does look bad, unless you really created the original array on Wednsday last week. Looks more like the array was re-created from the original disks, but not the original superblock. I suspect this could lead to a different parity disk or different disk order in the array. It _is_ possibe that I am reading this wrong, and the creation time gets updated on whatever recovery was done. Question is how did the array recover? It clearly did not do so by itself. Which distribution is this? > Raid Level : raid5 > Array Size : 2930279808 (2794.53 GiB 3000.61 GB) > Used Dev Size : 976759936 (931.51 GiB 1000.20 GB) > Raid Devices : 4 > Total Devices : 4 > Preferred Minor : 0 > Persistence : Superblock is persistent > > Update Time : Wed Jan 19 23:13:50 2011 > State : clean > Active Devices : 4 > Working Devices : 4 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 64K > > UUID : c4adbb76:fa093a35:2246d218:877ebf8a > Events : 0.2 > > Number Major Minor RaidDevice State > 0 8 1 0 active sync /dev/sda1 > 1 8 17 1 active sync /dev/sdb1 > 2 8 33 2 active sync /dev/sdc1 > 3 8 49 3 active sync /dev/sdd1 Rest looks fine. > > You can also run a RAID consistency check, which is > > a bit tricky. The procedure is to read > > /sys/block/md<nr>/md/mismatch_cnt to make sure it > > is zero, then echo 'check' into > > /sys/block/md<nr>/md/mismatch_cnt, wait until > > the ckeck is finished (as visible in /proc/mdstat) > > and check the mismatch_cnt again. If it is >0, > > then your RAID is in an inconsistent state. > > This test should not make changes to the disks. > > # cat /sys/block/md0/md/mismatch_cnt > 0 > # echo check >/sys/block/md0/md/sync_action > # cat /sys/block/md0/md/mismatch_cnt > 72 Wups, 72 mismatches on an array that was last synced a few days ago? Maybe this is actually one or more dying disks. Anything else than a zero result is bad. > > If the RAID did not assemble again properly, > > manual intervention and assembly may be necessary > > in order to unlock and safe the data. > > mdadm --assemble --scan works fine and sets up the raid just fine. Yes. But after 2 disks were kicked form a RAID5 it should not do that, unless you force it to. And if you force it, and it guesses wrong about which disk was kicked first, it will mix the state as the first disk was kicked with the state when the second disk was kicked. And overwrite the disk kicked second in the process. As far as I can see, this would result in a mixed disk state. Everything written between the first and second kick would be corrupt. However the key-slot would not be unless changed in between. It might just have a wrong disk order. Again, if it did assemble from the existing superblocks, the order will be right. > I *really* hope that I'm not completely scr*w*d :-( Impossible to tell at this time. Ok, next steps: 1. Post or send me a long SMART status for each disk (smartcl -a /dev/<disk>), these 72 inconsistencies are not good and need to be looked at. 2. Can you give me the header backup (a bit more than 1MB)? I cannot break your security that way, but looking at the borders of the header and keyslot may tell me whether the disk order was mixed up. This may allow reshuffeling of the disks to the correct order. (If that is the problem.) The rpocedure would be to produce one or more reshuffeled headers and give them back to you to see whether any allows an unlock. If so, the next step would be to reshuffle the whole array (which would still be inconsistent). 3. You could also give me the first 260kB of each disk, then I can check whether any of the 72 inconsistencies is in the key-slot area. Again, this does not allow me to break your security. And again, this would allow to create an alternate header that could work and allow you to unlock. These are my buest guesses. You can do all that yourself as well, refer to the FAQ for details of the on-disk LUKS structure, and Wikipedia for RAID5 if you plan to. There should also be information on how to interpret SMART data on the web, although I have some long-term experience in that area. Arno -- Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: arno@xxxxxxxxxxx GnuPG: ID: 1E25338F FP: 0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F ---- Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans If it's in the news, don't worry about it. The very definition of "news" is "something that hardly ever happens." -- Bruce Schneier _______________________________________________ dm-crypt mailing list dm-crypt@xxxxxxxx http://www.saout.de/mailman/listinfo/dm-crypt