On Mon, Mar 18, 2024 at 1:18 PM Mariusz Tkaczyk <mariusz.tkaczyk@xxxxxxxxxxxxxxx> wrote: > > On Sat, 16 Mar 2024 20:26:15 +0200 > Shaya Potter <spotter@xxxxxxxxx> wrote: > > > note: not subscribed, so please cc me on responses. > > > > I recently had a Dell R710 die where I was using the Perc6 to provide > > storage to the box. As the box wasn't usable, I decided to image the > > individual disks to a newer machine with significantly more storage. > > > > I sort of messed up the progress, but that might have discovered a bug in > > mdadm. > > > > Background, the Dell R710 supported 6 drives, which I had as a 1TB > > SATA SSD and 5x8TB SATA disks in a RAID5 array. > > > > In the process of imaging it, I I was setting up devices on /dev/loop > > to be prepared to assemble the raid, but I think I accidentally > > assembled the raid while imaging the last disk (which in effect caused > > the last disk to get out of sync with the other disks. This was > > initially ok, until the VM I was doing it on, crashed with a KVM/QEMU > > failure (unsure what occurred). > > > > I was hoping, it was going to be easy to bring up the raid array > > again, but now mdadm was segfault on a null pointer exception whenever > > I tried to assemble the array (was just trying the RAID5 portion). > > > > I was thinking perhaps my VM got corrupted, but I couldn't figure that > > out, so I decided to try and reimage the disks (more carefully this > > time), but yes, the 5th disk was marked as in quick init, while the > > others were more consistent. > > > > Howvever, same segfault was occuring, so I built mdadm from source > > (with -g and no -O, as an aside, this would be a good Makefile target > > to have, to make issues easier to debug) > > > > After understanding the issue, the segfault seems to be due to > > Assemble.c wanting to call update_super() with a ddf super. Except > > super-ddf.c doesn't provide that. > > > > i.e. in Assemble.c it was crashing at > > > > if (st->ss->update_super(st, &devices[j].i, UOPT_SPEC_ASSEMBLE, NULL, > > c->verbose, 0, NULL)) {...} > > > > which now explained the seg fault on null pointer exception. I was > > able to progress past the segfault (perhaps badly, but it "seems" to > > work for me), by putting in a null check before the update_super() > > call, i.e. > > > > if (st->ss->update_super && st->ss->update_super(....)) { ... } > > > > thoughts about my "fix" (perhaps super-ddf.c needs an empty > > update_super function?) , if this is a bug? (perhaps its unexpected > > for me to have gotten into this state in the first place?) > > > > Hello Shaya, > DDF is not actively developed. I'm considering dropping > it. > If you are interested in bringing it too life then you are > more than welcome to send patches! > > If DDF doesn't implement update_super() then fix proposed by you seems to be > valid. Please send proper patch for that then we will review it. > > Thanks, > Mariusz I'll make a proper patch in the coming days. just to note: it is very useful for recovering from RAID arrays that do provide that metadata. It would be a shame (IMO) to lose support for it, as it would have made my recovery/migration efforts much more difficult. At worst, I'd suggest marking it unmaintained, needing a specific flag to be used which notes, since it's unmaintained, it might go down code paths that are untested and could break in future (i.e. what happened to me). As a total other aside: md seems to work much better (performance wise) when using loop devices when the loop devices are created with direct-io support.