Hi Robin, On 03/04/2011 06:27 PM, Robin H. Johnson wrote: > (Please CC, not subscribed to linux-raid). > > Problem summary: > ------------------- > After a rebuild following disk replacement, the MD array (RAID6, 12 devices) > appears to have shrunk by 10880KiB. Presumed at the start of the device, but no > confirmation. Sounds similar to a problem recently encountered by Simon McNeil... > Background: > ----------- > I got called in to help a friend with a data loss problem after a catastrophic > UPS failure which killed at least one motherboards, and several disks. Almost > all of which lead to no data loss, except for one system... > > For the system in question, one disk died (cciss/c1d12), and was > promptly replaced, and this problem started when the rebuild kicked in. > > Prior to calling me, my friend had already tried a few things from a rescue > env, and almost certainly contributed to making the problem worse, and doesn't > have good logs of what he did. I have a suspicion that 'mdadm --create --assume-clean' or some variant was one of those. And that the rescue environment has a version of mdadm >= 3.1.2. The default metadata alignment changed in that version. > The MD array was portions of two very large LVM LVs (15TiB and ~20TiB > respectively). Specifically, the PV of the MD array was chunk in the middle of > each of the two LVs. > > The kernel version 2.6.35.4 did not change during the power outage. > > Problem identification: > ----------------------- > When bringing the system back online, LVM refused to make one LV accessible as > it complained of a shrunk device. One other LV exhibited corruption. > > The entry in /proc/partitions noted the array size of 14651023360KiB, while > older LVM backups showed the usable size of the array to previously be > 14651034240KiB, a difference of 10880KiB. > > The first LV has inaccessible data for all files at or after the missing chunk. > All files prior to that point are accessible. > > LVM refused to bring the second LV online as it complained the physical device > was now too small for all the extents. > > Prior to the outage, 800KiB of the collected devices was used for metadata, and > post the outage, now 11680KiB is used (difference of 10880 KIB). > > Questions: > ---------- > Why did the array shrink? How can I get it back to the original size, or > accurately identify the missing chunk size and offset, so that I can adjust the > LVM definitions and recover the other data. Please share mdadm -E for all of the devices in the problem array, and a sample of mdadm -E for some of the devices in the working arrays. I think you'll find differences in the data offset. Newer mdadm aligns to 1MB. Older mdadm aligns to "superblock size + bitmap size". "mdadm -E /dev/cciss/c1d{12..23}p1" should show us individual device details for the problem array. > Collected information: > ---------------------- > > Relevant lines from /proc/partitions: > ===================================== > 9 3 14651023360 md3 > 105 209 1465103504 cciss/c1d13p1 > ... > > Line from mdstat right now: > =========================== > md3 : active raid6 cciss/c1d18p1[5] cciss/c1d17p1[4] cciss/c1d13p1[0] > cciss/c1d21p1[8] cciss/c1d20p1[7] cciss/c1d19p1[6] cciss/c1d15p1[2] > cciss/c1d12p1[12] cciss/c1d14p1[1] cciss/c1d23p1[10] cciss/c1d16p1[3] > cciss/c1d22p1[9] > 14651023360 blocks super 1.2 level 6, 64k chunk, algorithm 2 > [12/12] [UUUUUUUUUUUU] > > MDADM output: > ============= > # mdadm --detail /dev/md3 > /dev/md3: > Version : 1.2 > Creation Time : Wed Feb 16 19:53:05 2011 > Raid Level : raid6 > Array Size : 14651023360 (13972.30 GiB 15002.65 GB) > Used Dev Size : 1465102336 (1397.23 GiB 1500.26 GB) > Raid Devices : 12 > Total Devices : 12 > Persistence : Superblock is persistent > > Update Time : Fri Mar 4 17:19:43 2011 > State : clean > Active Devices : 12 > Working Devices : 12 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 64K > > Name : CENSORED:3 (local to host CENSORED) > UUID : efa04ecf:4dbd0bfa:820a5942:de8a234f > Events : 25 > > Number Major Minor RaidDevice State > 0 105 209 0 active sync /dev/cciss/c1d13p1 > 1 105 225 1 active sync /dev/cciss/c1d14p1 > 2 105 241 2 active sync /dev/cciss/c1d15p1 > 3 105 257 3 active sync /dev/cciss/c1d16p1 > 4 105 273 4 active sync /dev/cciss/c1d17p1 > 5 105 289 5 active sync /dev/cciss/c1d18p1 > 6 105 305 6 active sync /dev/cciss/c1d19p1 > 7 105 321 7 active sync /dev/cciss/c1d20p1 > 8 105 337 8 active sync /dev/cciss/c1d21p1 > 9 105 353 9 active sync /dev/cciss/c1d22p1 > 10 105 369 10 active sync /dev/cciss/c1d23p1 > 12 105 193 11 active sync /dev/cciss/c1d12p1 The lowest device node is the last device role? Any chance these are also out of order? > LVM PV definition: > ================== > pv1 { > id = "CENSORED" > device = "/dev/md3" # Hint only > status = ["ALLOCATABLE"] > flags = [] > dev_size = 29302068480 # 13.6448 Terabytes > pe_start = 384 > pe_count = 3576912 # 13.6448 Terabytes > } It would be good to know where the LVM PV signature is on the problem array's devices, and which one has it. LVM stores a text copy of the VG's configuration in its metadata blocks at the beginning of a PV, so you should find it on the true "Raid device 0", at the original MD data offset from the beginning of the device. I suggest scripting a loop through each device, piping the first 1MB (with dd) to "strings -t x" to grep, looking for the PV uuid in clear text. > LVM segments output: > ==================== > > # lvs --units 1m --segments \ > -o lv_name,lv_size,seg_start,seg_start_pe,seg_size,seg_pe_ranges \ > vg/LV1 vg/LV2 > LV LSize Start Start SSize PE Ranges > LV1 15728640m 0m 0 1048576m /dev/md2:1048576-1310719 > LV1 15728640m 1048576m 262144 1048576m /dev/md2:2008320-2270463 > LV1 15728640m 2097152m 524288 7936132m /dev/md3:1592879-3576911 > LV1 15728640m 10033284m 2508321 452476m /dev/md4:2560-115678 > LV1 15728640m 10485760m 2621440 5242880m /dev/md4:2084381-3395100 > LV2 20969720m 0m 0 4194304m /dev/md2:0-1048575 > LV2 20969720m 4194304m 1048576 1048576m /dev/md2:1746176-2008319 > LV2 20969720m 5242880m 1310720 456516m /dev/md2:2270464-2384592 > LV2 20969720m 5699396m 1424849 511996m /dev/md2:1566721-1694719 > LV2 20969720m 6211392m 1552848 4m /dev/md2:1566720-1566720 > LV2 20969720m 6211396m 1552849 6371516m /dev/md3:0-1592878 > LV2 20969720m 12582912m 3145728 512000m /dev/md2:1438720-1566719 > LV2 20969720m 13094912m 3273728 7874808m /dev/md4:115679-2084380 > If my suspicions are right, you'll have to use an old version of mdadm to redo an 'mdadm --create --assume-clean'. HTH, Phil -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html