Thank you all for your kind replies. One extra thought and question: would it help if I had some large file that is also on the array? Could I search for a part of the file on the individual drives, or at least on permutated arrays? I tried findsuper, but it keeps finding the same backup superblocks, no matter how I switch the order of the disks except for the first one. It might be possible, I think, that the superblocks fall on the same disk. That is only a general impression, so I am running it thoroughly on a smaller array, to make sure. Another issue with this approach is that it takes a lot of time: I have a 4.5 TB array with 720 permutations to try. This sound more like a job for a few years. 2009/12/10 <tytso@xxxxxxx>: > Something that may help is to use the findsuper program, in the > e2fsprogs sources; it's not built by default, but you can build it by > hand. > The group number information should help you determine the order of the > disks in the raid array. Same issue if I use the used inode count: the permutations yield the same numbers over and over again. I think dume2fs -h doesn't go into the actual drive, but only reads the descriptions in the beginning, and these fall on the same drive... 2009/12/10 Christian Kujau <lists@xxxxxxxxxxxxxxx>: > On Wed, 9 Dec 2009 at 20:50, Lucian Șandor wrote: >> Question 1: Is there a way to make dumpe2fs or another command >> estimate the number of files in what appears to be an ext3 partition? > > I can only think of: > $ dumpe2fs -h /dev/loop0 | egrep 'Inode count|Free inodes' > The difference between both values should be the used inodes, i.e. > files/directories on the filesystem. 2009/12/10 Andreas Dilger <adilger@xxxxxxx>: > On 2009-12-09, at 18:50, Lucian Șandor wrote: >> >> However, no combination seems useful. Sometimes I get: >> "e2fsck: Bad magic number in super-block while trying to open /dev/md0" >> Other times I get: >> "Superblock has an invalid journal (inode 8)." >> Other times I get: >> "e2fsck: Illegal inode number while checking ext3 journal for /dev/md2." >> None of these appears in only one permutation, so none is indicative >> for the corectness of the permutation. > > You need to know a bit about your RAID layout and the structure of ext*. > One thing that is VERY important is whether your new MD config has the same > chunk size as it did initially. It will be impossible to recover your > config if you don't have the same chunk size. > > Also, if you haven't disabled RAID resync then it may well be that changing > the RAID layout has caused a resync that has permanently corrupted your > data. I have the chunk size for one of the arrays. I thought that mdadm would automatically use the same values it used when it first created the arrays, but gues what, it did not. Now I have another headache for the other array. The arrays were degraded at the time of the whole mess, and I always re-created them as degraded. I wonder how long can I still pull this feat, after being so messy in the first place. > That said, I will assume the primary ext3 superblock will reside on the > first disk in the RAID set, since it is located at an offset of 1kB from the > start of the device. > > You should build and run the "findsuper" tool that is in the e2fsprogs > source tree. It will scan the raw disk devices and locate the ext3 > superblocks. Each superblock contains the group number in which it is > stored, so you can find the first RAID disk by looking for the one that has > superblock 0 at offset 1kB from the start of the disk. > > There may be other copies of the superblock #0 stored in the journal file, > but those should be ignored. > > The backup superblocks have a non-zero group number, and "findsuper" prints > the offset at which that superblock should be located from the start of the > LUN. Depending on whether you have a non-power-of-two number of disks in > your RAID set, you may find the superblock copies on different disks, and > you can do some math to determine which order the disks should be in by > computing the relative offset of the superblck within the RAID set. > > > The other thing that can help order the disks (depending on the RAID > chunksize and the total number of groups in the filesystem, proportional to > the filesystem size) is the group descriptor table. It is located > immediately after the superblocks, and contains a very regular list of block > numbers for the block and inode bitmaps, and the inode table in each group. > > Using "od -Ax -tx4" on a regular ext3 filesystem you can see the group > descriptor table starting at offset 0x1000, and the block numbers basically > just "count" up. This may in fact be the easiest way to order the disks, if > the group descriptor table is large enough to cover all of the disks: > > # od -Ax -tx4 /dev/hda1 | more > : > : > 001000 0000012c 0000012d 0000012e 02430000 > 001010 000001f2 00000000 00000000 00000000 > 001020 0000812c 0000812d 0000812e 2e422b21 > 001030 0000000d 00000000 00000000 00000000 > 001040 00010000 00010001 00010002 27630074 > 001050 000000b8 00000000 00000000 00000000 > 001060 0001812c 0001812d 0001812e 27a70b8a > 001070 00000231 00000000 00000000 00000000 > 001080 00020000 00020001 00020002 2cc10000 > 001090 00000008 00000000 00000000 00000000 > 0010a0 0002812c 0002812d 0002812e 25660134 > 0010b0 00000255 00000000 00000000 00000000 > 0010c0 00030000 00030001 00030002 17a50003 > 0010d0 000001c6 00000000 00000000 00000000 > 0010e0 0003812c 0003812d 0003812e 27a70000 > 0010f0 00000048 00000000 00000000 00000000 > 001100 00040000 00040001 00040002 2f8b0000 > > See nearly regular incrementing sequence every 0x20 bytes: > > 0000012c, 0000812c, 00010000, 0001812c, 00020000, 0002812c, 00030000, > 0003812c > > > Each group descriptor block (4kB = 0x1000) covers 16GB of filesystem space, > so 64 blocks per 1TB of filesystem size. If your RAID chunk size is not > too large, and the filesystem IS large, you will be able to fully order your > disks in the RAID set. You can also verify the RAID chunk size by > determining how many blocks of consecutive group descriptors are present > before there is a "jump" where the group descriptor blocks were written to > other disks before returning to the current disk. Remember that one of the > disks in the set will also need to store parity, so there will be some > number of "garbage" blocks before the proper data resumes. > This seems a great idea. The 4.5 TB array is huge (should have a 1100 kB table), and likely its group descriptor table extends on all partitions. I already found the pattern, but the job requires programming, since it would be troubling to read megs of data over the hundreds of permutations. I will try coding it, but I hope that somebody else wrote it before. Isn't there any utility that will take a group descriptor table and verify its integrity without modifying it? >> I also ran dumpe2fs /dev/md2, but I don't know how to make it more >> useful than it is now. Right now it finds supernodes in a series of >> permutations, so again, it is not of much help. > > I would also make sure that you can get the correct ordering and MD chunk > size before doing ANY kind of modification to the disks. It would only take > a single mistake (e.g. RAID parity rebuild while not in the right order) to > totally corrupt the filesystem. > >> Question 1: Is there a way to make dumpe2fs or another command >> estimate the number of files in what appears to be an ext3 partition? >> (I would then go by the permutation which fonds the largest number of >> files.) >> Question: if I were to struck lucky and find the right combination, >> would dumpe2fs give me a very-very long list of superblocks? Do the >> superblocks extend far into the partition, or do they always stop >> early (thus showing the same number each time my RAID starts with the >> right drive)? >> >> Question 3: Is there any other tool that would search for files in the >> remains of an ext3 partition, and, this way, validate or invalidate >> the permutations I try? >> >> Thanks, >> Lucian Sandor >> >> >> 2009/12/9 Eric Sandeen <sandeen@xxxxxxxxxx>: >>> >>> Lucian Șandor wrote: >>>> >>>> Hi all, >>>> >>>> Somehow I managed to mess with a RAID array containing an ext3 >>>> partition. >>>> >>>> Parenthesis, if it matters: I disconnected physically a drive while >>>> the array was online. Next thing, I lost the right order of the drives >>>> in the array. While trying to re-create it, I overwrote the raid >>>> superblocks. Luckily, the array was RAID5 degraded, so whenever I >>>> re-created it, it didn't go into sync; thus, everything besides the >>>> RAID superblocks is preserved (or so I think). >>>> >>>> Now, I am trying to re-create the array in the proper order. It takes >>>> me countless attempts, through hundreds of permutations. I am doing it >>>> programatically, but I don't think I have the right tool. >>>> Now, after creating the array and mounting it with >>>> mount -t ext3 -n -r /dev/md2 /media/olddepot >>>> I issue an: >>>> e2fsck -n -f /media/olddepot >>>> However, I cycled through all the permutations without apparent >>>> success. I.e., in all combinations it just refused to check it, saying >>>> something about "short read" and, of course, about invalid file >>>> systems. >>> >>> As Christian pointed out, use the device not the mountpoint for the fsck >>> arg: >>> >>> [tmp]$ mkdir dir >>> [tmp]$ e2fsck -fn dir/ >>> e2fsck 1.41.4 (27-Jan-2009) >>> e2fsck: Attempt to read block from filesystem resulted in short read >>> while trying to open dir/ >>> Could this be a zero-length partition? >>> >>> >>> :) >>> >>> -Eric >>> >> > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ Ext3-users mailing list Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users