Hi! Am 10.07.2012 um 15:02 schrieb NeilBrown: > On Tue, 10 Jul 2012 12:46:19 +0200 Sebastian Hegler <sebastian.hegler@xxxxxxxxxxxxx> wrote: >> I had to shut down a server with a raid grow operation (cleanly). After some hassles I got the array assembled again (manually), but it would not continue to grow. When trying >> >> mdadm --grow --continue /dev/md127 >> >> (as per the manpage) I receive a segfault. Compiling from source, and using gdb, I see: >> >> root@kuiper:~/mdadm-3.2.5# gdb ./mdadm >> [SNIP] >> Reading symbols from /root/mdadm-3.2.5/mdadm...done. >> (gdb) set args --grow --continue /dev/md127 >> (gdb) run >> Starting program: /root/mdadm-3.2.5/mdadm --grow --continue /dev/md127 >> >> Program received signal SIGSEGV, Segmentation fault. >> Grow_continue_command (devname=0x7fffffffe8cc "/dev/md127", fd=7, backup_file=0x0, verbose=0) at Grow.c:4118 >> 4118 if (verify_reshape_position(content, >> (gdb) bt >> #0 Grow_continue_command (devname=0x7fffffffe8cc "/dev/md127", fd=7, backup_file=0x0, verbose=0) at Grow.c:4118 >> #1 0x0000000000407ac2 in main (argc=4, argv=0x7fffffffe678) at mdadm.c:1701 >> >> >> >> The same bug is present in the git repository, but another location: >> >> root@kuiper:~/mdadm.git# gdb ./mdadm >> [SNIP] >> Reading symbols from /root/mdadm.git/mdadm...done. >> (gdb) set args --grow --continue /dev/md127 >> (gdb) run >> Starting program: /root/mdadm.git/mdadm --grow --continue /dev/md127 >> >> Program received signal SIGSEGV, Segmentation fault. >> Grow_continue_command (devname=0x7fffffffe8d0 "/dev/md127", fd=7, backup_file=0x0, verbose=0) at Grow.c:4086 >> 4086 if (verify_reshape_position(content, >> (gdb) bt >> #0 Grow_continue_command (devname=0x7fffffffe8d0 "/dev/md127", fd=7, backup_file=0x0, verbose=0) at Grow.c:4086 >> #1 0x0000000000406c39 in main (argc=4, argv=0x7fffffffe678) at mdadm.c:1447 >> >> >> I also filed a bug against Debian's BTS, but I did not receive a bug number yet. In the meantime, I'd be very happy about any information on how to get my RAID array back into growing. I'm not on the list, please CC me. > > --grow --continue should only be needed if the array was assembled with --freeze-reshape (I think). > > Normally, the reshape should continue automatically. Well, it did not. > cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md127 : active raid6 sdi[14] sdh[22] sdt[11] sdp[16] sdq[13] sdo[17] sdn[18] sdu[12] sdm[19] sdl[20] sdk[21] sdj[15] 17581629312 blocks super 1.2 level 6, 128k chunk, algorithm 2 [12/12] [UUUUUUUUUUUU] > mdadm -E .... root@kuiper:~# for i in /sys/block/md127/md/dev-sd* ; do mdadm -E /`basename $i |tr - / ` ; done /dev/sdh: Magic : a92b4efc Version : 1.2 Feature Map : 0x4 Array UUID : 42683d6d:6479d415:492ede77:6fe9437b Name : kuiper:scratch2G (local to host kuiper) Creation Time : Fri Apr 16 12:26:18 2010 Raid Level : raid6 Raid Devices : 12 Avail Dev Size : 3907028736 (1863.02 GiB 2000.40 GB) Array Size : 19535143680 (18630.17 GiB 20003.99 GB) Data Offset : 432 sectors Super Offset : 8 sectors State : active Device UUID : f281deb2:b5de8070:487c1900:0078e6e1 Reshape pos'n : 6220766720 (5932.59 GiB 6370.07 GB) Delta Devices : 1 (11->12) Update Time : Tue Jul 10 16:10:45 2012 Checksum : 16d3efce - correct Events : 6245853 Layout : left-symmetric Chunk Size : 128K Device Role : Active device 11 Array State : AAAAAAAAAAAA ('A' == active, '.' == missing) /dev/sdi: Magic : a92b4efc Version : 1.2 Feature Map : 0x4 Array UUID : 42683d6d:6479d415:492ede77:6fe9437b Name : kuiper:scratch2G (local to host kuiper) Creation Time : Fri Apr 16 12:26:18 2010 Raid Level : raid6 Raid Devices : 12 Avail Dev Size : 3907028736 (1863.02 GiB 2000.40 GB) Array Size : 19535143680 (18630.17 GiB 20003.99 GB) Data Offset : 432 sectors Super Offset : 8 sectors State : active Device UUID : b67b1847:cb68ac69:7ab0b4ee:894a268b Reshape pos'n : 6220766720 (5932.59 GiB 6370.07 GB) Delta Devices : 1 (11->12) Update Time : Tue Jul 10 16:10:45 2012 Checksum : 3c147a5b - correct Events : 6245853 Layout : left-symmetric Chunk Size : 128K Device Role : Active device 0 Array State : AAAAAAAAAAAA ('A' == active, '.' == missing) /dev/sdj: Magic : a92b4efc Version : 1.2 Feature Map : 0x4 Array UUID : 42683d6d:6479d415:492ede77:6fe9437b Name : kuiper:scratch2G (local to host kuiper) Creation Time : Fri Apr 16 12:26:18 2010 Raid Level : raid6 Raid Devices : 12 Avail Dev Size : 3907028736 (1863.02 GiB 2000.40 GB) Array Size : 19535143680 (18630.17 GiB 20003.99 GB) Data Offset : 432 sectors Super Offset : 8 sectors State : clean Device UUID : 5c4e03cc:e2fe014e:bd3d17a9:517d34f5 Reshape pos'n : 6220766720 (5932.59 GiB 6370.07 GB) Delta Devices : 1 (11->12) Update Time : Tue Jul 10 16:10:46 2012 Checksum : c9c5a324 - correct Events : 6245852 Layout : left-symmetric Chunk Size : 128K Device Role : Active device 1 Array State : AAAAAAAAAAAA ('A' == active, '.' == missing) /dev/sdk: Magic : a92b4efc Version : 1.2 Feature Map : 0x4 Array UUID : 42683d6d:6479d415:492ede77:6fe9437b Name : kuiper:scratch2G (local to host kuiper) Creation Time : Fri Apr 16 12:26:18 2010 Raid Level : raid6 Raid Devices : 12 Avail Dev Size : 3907028736 (1863.02 GiB 2000.40 GB) Array Size : 19535143680 (18630.17 GiB 20003.99 GB) Data Offset : 432 sectors Super Offset : 8 sectors State : clean Device UUID : e276dc2b:048e6135:2364e8d0:00566eed Reshape pos'n : 6220766720 (5932.59 GiB 6370.07 GB) Delta Devices : 1 (11->12) Update Time : Tue Jul 10 16:10:46 2012 Checksum : 310959e7 - correct Events : 6245852 Layout : left-symmetric Chunk Size : 128K Device Role : Active device 2 Array State : AAAAAAAAAAAA ('A' == active, '.' == missing) /dev/sdl: Magic : a92b4efc Version : 1.2 Feature Map : 0x4 Array UUID : 42683d6d:6479d415:492ede77:6fe9437b Name : kuiper:scratch2G (local to host kuiper) Creation Time : Fri Apr 16 12:26:18 2010 Raid Level : raid6 Raid Devices : 12 Avail Dev Size : 3907028736 (1863.02 GiB 2000.40 GB) Array Size : 19535143680 (18630.17 GiB 20003.99 GB) Data Offset : 432 sectors Super Offset : 8 sectors State : clean Device UUID : 547956df:6be81839:89424cbc:664bc6d6 Reshape pos'n : 6220766720 (5932.59 GiB 6370.07 GB) Delta Devices : 1 (11->12) Update Time : Tue Jul 10 16:10:46 2012 Checksum : bcf68a8b - correct Events : 6245852 Layout : left-symmetric Chunk Size : 128K Device Role : Active device 3 Array State : AAAAAAAAAAAA ('A' == active, '.' == missing) /dev/sdm: Magic : a92b4efc Version : 1.2 Feature Map : 0x4 Array UUID : 42683d6d:6479d415:492ede77:6fe9437b Name : kuiper:scratch2G (local to host kuiper) Creation Time : Fri Apr 16 12:26:18 2010 Raid Level : raid6 Raid Devices : 12 Avail Dev Size : 3907028736 (1863.02 GiB 2000.40 GB) Array Size : 19535143680 (18630.17 GiB 20003.99 GB) Data Offset : 432 sectors Super Offset : 8 sectors State : clean Device UUID : 72a0a33b:faa5ecd4:7b543389:f49798a2 Reshape pos'n : 6220766720 (5932.59 GiB 6370.07 GB) Delta Devices : 1 (11->12) Update Time : Tue Jul 10 16:10:46 2012 Checksum : 4dd0cdb7 - correct Events : 6245852 Layout : left-symmetric Chunk Size : 128K Device Role : Active device 4 Array State : AAAAAAAAAAAA ('A' == active, '.' == missing) /dev/sdn: Magic : a92b4efc Version : 1.2 Feature Map : 0x4 Array UUID : 42683d6d:6479d415:492ede77:6fe9437b Name : kuiper:scratch2G (local to host kuiper) Creation Time : Fri Apr 16 12:26:18 2010 Raid Level : raid6 Raid Devices : 12 Avail Dev Size : 3907028736 (1863.02 GiB 2000.40 GB) Array Size : 19535143680 (18630.17 GiB 20003.99 GB) Data Offset : 432 sectors Super Offset : 8 sectors State : clean Device UUID : 11c4c3ac:ac1be81a:d4750037:45a85a60 Reshape pos'n : 6220766720 (5932.59 GiB 6370.07 GB) Delta Devices : 1 (11->12) Update Time : Tue Jul 10 16:10:46 2012 Checksum : 707b98b0 - correct Events : 6245852 Layout : left-symmetric Chunk Size : 128K Device Role : Active device 6 Array State : AAAAAAAAAAAA ('A' == active, '.' == missing) /dev/sdo: Magic : a92b4efc Version : 1.2 Feature Map : 0x4 Array UUID : 42683d6d:6479d415:492ede77:6fe9437b Name : kuiper:scratch2G (local to host kuiper) Creation Time : Fri Apr 16 12:26:18 2010 Raid Level : raid6 Raid Devices : 12 Avail Dev Size : 3907028736 (1863.02 GiB 2000.40 GB) Array Size : 19535143680 (18630.17 GiB 20003.99 GB) Data Offset : 432 sectors Super Offset : 8 sectors State : clean Device UUID : 76692433:04aab129:7f46b4f6:d2d93ffd Reshape pos'n : 6220766720 (5932.59 GiB 6370.07 GB) Delta Devices : 1 (11->12) Update Time : Tue Jul 10 16:10:46 2012 Checksum : 623ecea5 - correct Events : 6245852 Layout : left-symmetric Chunk Size : 128K Device Role : Active device 7 Array State : AAAAAAAAAAAA ('A' == active, '.' == missing) /dev/sdp: Magic : a92b4efc Version : 1.2 Feature Map : 0x4 Array UUID : 42683d6d:6479d415:492ede77:6fe9437b Name : kuiper:scratch2G (local to host kuiper) Creation Time : Fri Apr 16 12:26:18 2010 Raid Level : raid6 Raid Devices : 12 Avail Dev Size : 3907028736 (1863.02 GiB 2000.40 GB) Array Size : 19535143680 (18630.17 GiB 20003.99 GB) Data Offset : 432 sectors Super Offset : 8 sectors State : clean Device UUID : f62ba0d8:17345ece:55a5d1bf:39540cc5 Reshape pos'n : 6220766720 (5932.59 GiB 6370.07 GB) Delta Devices : 1 (11->12) Update Time : Tue Jul 10 16:10:46 2012 Checksum : 3d50f475 - correct Events : 6245852 Layout : left-symmetric Chunk Size : 128K Device Role : Active device 9 Array State : AAAAAAAAAAAA ('A' == active, '.' == missing) /dev/sdq: Magic : a92b4efc Version : 1.2 Feature Map : 0x4 Array UUID : 42683d6d:6479d415:492ede77:6fe9437b Name : kuiper:scratch2G (local to host kuiper) Creation Time : Fri Apr 16 12:26:18 2010 Raid Level : raid6 Raid Devices : 12 Avail Dev Size : 3907028736 (1863.02 GiB 2000.40 GB) Array Size : 19535143680 (18630.17 GiB 20003.99 GB) Data Offset : 432 sectors Super Offset : 8 sectors State : clean Device UUID : a4091eff:68556278:3a22b428:aa59ea46 Reshape pos'n : 6220766720 (5932.59 GiB 6370.07 GB) Delta Devices : 1 (11->12) Update Time : Tue Jul 10 16:10:46 2012 Checksum : f89375c5 - correct Events : 6245852 Layout : left-symmetric Chunk Size : 128K Device Role : Active device 8 Array State : AAAAAAAAAAAA ('A' == active, '.' == missing) /dev/sdt: Magic : a92b4efc Version : 1.2 Feature Map : 0x4 Array UUID : 42683d6d:6479d415:492ede77:6fe9437b Name : kuiper:scratch2G (local to host kuiper) Creation Time : Fri Apr 16 12:26:18 2010 Raid Level : raid6 Raid Devices : 12 Avail Dev Size : 3907028736 (1863.02 GiB 2000.40 GB) Array Size : 19535143680 (18630.17 GiB 20003.99 GB) Data Offset : 432 sectors Super Offset : 8 sectors State : clean Device UUID : f8425467:c7533842:df4c4da6:424e107f Reshape pos'n : 6220766720 (5932.59 GiB 6370.07 GB) Delta Devices : 1 (11->12) Update Time : Tue Jul 10 16:10:46 2012 Checksum : e05eccb3 - correct Events : 6245852 Layout : left-symmetric Chunk Size : 128K Device Role : Active device 10 Array State : AAAAAAAAAAAA ('A' == active, '.' == missing) /dev/sdu: Magic : a92b4efc Version : 1.2 Feature Map : 0x4 Array UUID : 42683d6d:6479d415:492ede77:6fe9437b Name : kuiper:scratch2G (local to host kuiper) Creation Time : Fri Apr 16 12:26:18 2010 Raid Level : raid6 Raid Devices : 12 Avail Dev Size : 3907028736 (1863.02 GiB 2000.40 GB) Array Size : 19535143680 (18630.17 GiB 20003.99 GB) Data Offset : 432 sectors Super Offset : 8 sectors State : clean Device UUID : acde8544:51d21dcd:e6b6827e:c258f274 Reshape pos'n : 6220766720 (5932.59 GiB 6370.07 GB) Delta Devices : 1 (11->12) Update Time : Tue Jul 10 16:10:46 2012 Checksum : 168d5b7a - correct Events : 6245852 Layout : left-symmetric Chunk Size : 128K Device Role : Active device 5 Array State : AAAAAAAAAAAA ('A' == active, '.' == missing) > mdadm -D ... root@kuiper:~# mdadm -D /dev/md127 /dev/md127: Version : 1.2 Creation Time : Fri Apr 16 12:26:18 2010 Raid Level : raid6 Array Size : 17581629312 (16767.15 GiB 18003.59 GB) Used Dev Size : 1953514368 (1863.02 GiB 2000.40 GB) Raid Devices : 12 Total Devices : 12 Persistence : Superblock is persistent Update Time : Tue Jul 10 16:11:23 2012 State : clean Active Devices : 12 Working Devices : 12 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 128K Delta Devices : 1, (11->12) Name : kuiper:scratch2G (local to host kuiper) UUID : 42683d6d:6479d415:492ede77:6fe9437b Events : 6245852 Number Major Minor RaidDevice State 14 8 128 0 active sync /dev/sdi 15 8 144 1 active sync /dev/sdj 21 8 160 2 active sync /dev/sdk 20 8 176 3 active sync /dev/sdl 19 8 192 4 active sync /dev/sdm 12 65 64 5 active sync /dev/sdu 18 8 208 6 active sync /dev/sdn 17 8 224 7 active sync /dev/sdo 13 65 0 8 active sync /dev/sdq 16 8 240 9 active sync /dev/sdp 11 65 48 10 active sync /dev/sdt 22 8 112 11 active sync /dev/sdh > would be helpful, maybe together with kernel logs if there are any. Please see the attached files. I truncated the file up until the first segfault of mdadm. Short wrapup of the contents: * Bad interaction: "mpt-sas ioc1" is too slow for md. md is assembling RAID arrays from the disks it sees, creating some MD devices, one partially broken and recovering where there is no need (if md had waited), and two incomplete arrays actually belonging to one array (timestamps: 35.655213, 34.528229). Some disks are also completely missing thanks to failed slots in the enclosure. * I then removed all drives belonging to the array I interrupted in the Grow procedure (via "/sys/block/sd*/delete"), re-inserted them into the enclosure, and re-assembled the RAID array manually. Finish is at "1397.498460": "md/raid:md127: reshape will continue" * At "1565.278839" The RAID array is in "auto-read-only" mode. I mount the FS, wondering about not seeing the enclosure LEDs blink. After checking "/proc/mdstat", I see that everything is quiet, although I expected a progress bar. * At "2612.464768" (fourth-to-last line) mdadm segfaults with the command "mdadm --grow --continue /dev/md127" After yet another reboot, and careful manual re-assembly of the RAID array in question, everything works fine. Grow operation continued after mounting the file system. The issue was assigned bug number 681056 in the Debian BTS in the meantime: CCing the BTS. Yours, Sebastian
Attachment:
kern.mdcrash.log.xz
Description: Binary data