On Tue, Aug 08 2017, Andrea Righi wrote: > I was able to trigger this curious problem that seems to happen only on > one of our server: > > # mdadm --assemble /dev/md/10.4.237.12-volume --name 10.4.237.12-volume > Segmentation fault > > This md volume is a raid1 volume made of 2 device mapper (dm-multipath) > devices and the underlying LUNs are imported via iSCSI. > > Applying the following patch (see below) seems to fix the problem: > > # ./mdadm --assemble /dev/md/10.4.237.12-volume --name 10.4.237.12-volume > mdadm: /dev/md/10.4.237.12-volume has been started with 2 drives. > > But I'm not sure if it's the right fix or if there're some other > problems that I'm missing. > > More details about the md superblocks that might help to better > understand the nature of the problem: > > # for i in 36001405a04ed0c104881{1,2}00000000000p2; do echo dev: ${i}; mdadm --examine /dev/mapper/${i}; done > dev: 36001405a04ed0c104881100000000000p2 > /dev/mapper/36001405a04ed0c104881100000000000p2: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : 5f3e8283:7f831b85:bc1958b9:6f2787a4 > Name : 10.4.237.12-volume > Creation Time : Thu Jul 27 14:43:16 2017 > Raid Level : raid1 > Raid Devices : 2 > > Avail Dev Size : 1073729503 (511.99 GiB 549.75 GB) > Array Size : 536864704 (511.99 GiB 549.75 GB) > Used Dev Size : 1073729408 (511.99 GiB 549.75 GB) > Data Offset : 8192 sectors > Super Offset : 8 sectors > Unused Space : before=8104 sectors, after=95 sectors > State : clean > Device UUID : 16dae7e3:42f3487f:fbeac43a:71cf1f63 > > Internal Bitmap : 8 sectors from superblock > Update Time : Tue Aug 8 11:12:22 2017 > Bad Block Log : 512 entries available at offset 72 sectors > Checksum : 518c443e - correct > Events : 167 > > > Device Role : Active device 0 > Array State : AA ('A' == active, '.' == missing, 'R' == replacing) > dev: 36001405a04ed0c104881200000000000p2 > /dev/mapper/36001405a04ed0c104881200000000000p2: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : 5f3e8283:7f831b85:bc1958b9:6f2787a4 > Name : 10.4.237.12-volume > Creation Time : Thu Jul 27 14:43:16 2017 > Raid Level : raid1 > Raid Devices : 2 > > Avail Dev Size : 1073729503 (511.99 GiB 549.75 GB) > Array Size : 536864704 (511.99 GiB 549.75 GB) > Used Dev Size : 1073729408 (511.99 GiB 549.75 GB) > Data Offset : 8192 sectors > Super Offset : 8 sectors > Unused Space : before=8104 sectors, after=95 sectors > State : clean > Device UUID : ef612bdd:e475fe02:5d3fc55e:53612f34 > > Internal Bitmap : 8 sectors from superblock > Update Time : Tue Aug 8 11:12:22 2017 > Bad Block Log : 512 entries available at offset 72 sectors > Checksum : c39534fd - correct > Events : 167 > > > Device Role : Active device 1 > Array State : AA ('A' == active, '.' == missing, 'R' == replacing) > > # for i in 36001405a04ed0c104881{1,2}00000000000p2; do echo dev: ${i}; hexdump -s 4096 -n 4189696 -C /dev/mapper/${i}; done > dev: 36001405a04ed0c104881100000000000p2 > 00001000 fc 4e 2b a9 01 00 00 00 01 00 00 00 00 00 00 00 |.N+.............| > 00001010 5f 3e 82 83 7f 83 1b 85 bc 19 58 b9 6f 27 87 a4 |_>........X.o'..| > 00001020 31 30 2e 34 2e 32 33 37 2e 31 32 2d 76 6f 6c 75 |10.4.237.12-volu| > 00001030 6d 65 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |me..............| > 00001040 64 50 7a 59 00 00 00 00 01 00 00 00 00 00 00 00 |dPzY............| > 00001050 80 cf ff 3f 00 00 00 00 00 00 00 00 02 00 00 00 |...?............| > 00001060 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > 00001070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > 00001080 00 20 00 00 00 00 00 00 df cf ff 3f 00 00 00 00 |. .........?....| > 00001090 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > 000010a0 00 00 00 00 00 00 00 00 16 da e7 e3 42 f3 48 7f |............B.H.| > 000010b0 fb ea c4 3a 71 cf 1f 63 00 00 08 00 48 00 00 00 |...:q..c....H...| > 000010c0 54 f0 89 59 00 00 00 00 a7 00 00 00 00 00 00 00 |T..Y............| > 000010d0 ff ff ff ff ff ff ff ff 9c 43 8c 51 80 00 00 00 |.........C.Q....| > 000010e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > * > 00001100 00 00 01 00 fe ff fe ff fe ff fe ff fe ff fe ff |................| > 00001110 fe ff fe ff fe ff fe ff fe ff fe ff fe ff fe ff |................| > * > 00001200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > * > 00002000 62 69 74 6d 04 00 00 00 5f 3e 82 83 7f 83 1b 85 |bitm...._>......| > 00002010 bc 19 58 b9 6f 27 87 a4 a7 00 00 00 00 00 00 00 |..X.o'..........| > 00002020 a7 00 00 00 00 00 00 00 80 cf ff 3f 00 00 00 00 |...........?....| > 00002030 00 00 00 00 00 00 00 01 05 00 00 00 00 00 00 00 |................| > 00002040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > * > 00003100 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| > * > 00004000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > * > 003ffe00 > dev: 36001405a04ed0c104881200000000000p2 > 00001000 fc 4e 2b a9 01 00 00 00 01 00 00 00 00 00 00 00 |.N+.............| > 00001010 5f 3e 82 83 7f 83 1b 85 bc 19 58 b9 6f 27 87 a4 |_>........X.o'..| > 00001020 31 30 2e 34 2e 32 33 37 2e 31 32 2d 76 6f 6c 75 |10.4.237.12-volu| > 00001030 6d 65 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |me..............| > 00001040 64 50 7a 59 00 00 00 00 01 00 00 00 00 00 00 00 |dPzY............| > 00001050 80 cf ff 3f 00 00 00 00 00 00 00 00 02 00 00 00 |...?............| > 00001060 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > 00001070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > 00001080 00 20 00 00 00 00 00 00 df cf ff 3f 00 00 00 00 |. .........?....| > 00001090 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > 000010a0 01 00 00 00 00 00 00 00 ef 61 2b dd e4 75 fe 02 |.........a+..u..| > 000010b0 5d 3f c5 5e 53 61 2f 34 00 00 08 00 48 00 00 00 |]?.^Sa/4....H...| > 000010c0 54 f0 89 59 00 00 00 00 a7 00 00 00 00 00 00 00 |T..Y............| > 000010d0 ff ff ff ff ff ff ff ff 5b 34 95 c3 80 00 00 00 |........[4......| > 000010e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > * > 00001100 00 00 01 00 fe ff fe ff fe ff fe ff fe ff fe ff |................| > 00001110 fe ff fe ff fe ff fe ff fe ff fe ff fe ff fe ff |................| > * > 00001200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > * > 00002000 62 69 74 6d 04 00 00 00 5f 3e 82 83 7f 83 1b 85 |bitm...._>......| > 00002010 bc 19 58 b9 6f 27 87 a4 a7 00 00 00 00 00 00 00 |..X.o'..........| > 00002020 a7 00 00 00 00 00 00 00 80 cf ff 3f 00 00 00 00 |...........?....| > 00002030 00 00 00 00 00 00 00 01 05 00 00 00 00 00 00 00 |................| > 00002040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > * > 00003100 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| > * > 00004000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > * > 003ffe00 > > --- > Assemble: prevent segfault with faulty "best" devices > > In Assemble(), after context reload, best[i] can be -1 in some cases, > and before checking if this value is negative we use it to access > devices[j].i.disk.raid_disk, potentially causing a segfault. > > Check if best[i] is negative before using it to prevent this potential > segfault. > > Signed-off-by: Andrea Righi <andrea@xxxxxxxxxxxxxxx> > Signed-off-by: Robert LeBlanc <robert@xxxxxxxxxxxxx> > --- > Assemble.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/Assemble.c b/Assemble.c > index 3da0903..fc681eb 100644 > --- a/Assemble.c > +++ b/Assemble.c > @@ -1669,6 +1669,8 @@ try_again: > int j = best[i]; > unsigned int desired_state; > > + if (j < 0) > + continue; > if (devices[j].i.disk.raid_disk == MD_DISK_ROLE_JOURNAL) > desired_state = (1<<MD_DISK_JOURNAL); > else if (i >= content->array.raid_disks * 2) > @@ -1678,8 +1680,6 @@ try_again: > else > desired_state = (1<<MD_DISK_ACTIVE) | (1<<MD_DISK_SYNC); > > - if (j<0) > - continue; > if (!devices[j].uptodate) > continue; > Patch looks good to me, thanks. Regression was causes by commit 69a481166be6 ("Assemble array with write journal") which introduced a use of 'j' before the test if it was < 0. Fixes: 69a481166be6 ("Assemble array with write journal") Reviewed-by: NeilBrown <neilb@xxxxxxxx> Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature