I wanted to provide an update on this thread. First of all thank you for all the insights and recommendations. I finally found a way to recover my data and wanted to pass what the fix was in the event someone stumbles across this exact scenario. Summary below - I believe there is some kind of problem with kernel or module in 5.14.0-319.el9.x86_64 for my controller (ASMedia ASM1064 chipset) which I believe was responsible for the drives attached to it disappearing while my grow from raid 5 to raid 6 was taking place - After the above event (and rebooting) whenever I tried to assemble the raid to kick off resuming the rebuild mdadm would hang as previously described in this thread. - After Yu pointed me to a patch that might of bypass the issue I decided to first boot the system on a rescue disk with an older kernel (3.x) and mdadm version - Fortunately, my assemble succeeded and the grow resumed and the slow rebuild of my 30TB array completed 17 days later - My ASMedia ASM1064 chipset controller was 100% stable for the 17 days of rebuild on the old kernel - As soon as I went back to my 5.14.0-319.el9.x86_64 kernel my ASMedia ASM1064 controller started showing ata timeout errors and drives disappearing again - I ended up just purchasing another controller with a different chipset (Marvell 88SE9215) out of desperation and the system is finally stable and my data is all intact! Again thank you everyone for the help! --David On Mon, May 8, 2023 at 8:33 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > Hi, > > 在 2023/05/09 6:53, Roger Heflin 写道: > > On Mon, May 8, 2023 at 6:57 AM David Gilmour <dgilmour76@xxxxxxxxx> wrote: > >> > >> Ok, well I'm willing to try anything at this point. Do you need > >> anything from me for a patch? Here is my current kernel details: > > > > grep -i mdadm /etc/udev/rules.d/* /lib/udev/rules.d/* > > > > If you can find a udev rule that starts up the monitor then move that > > rule out of the directory, so that on the next assemble try it does > > not get started. > > > > If this is the recent bug that is being discussed then anything > > accessing the array after the reshape will deadlock the array and the > > reshape. > > It's not anything accessing the array, in fact, it's only the io accross > reshape position can trigger the deadlock. > > I just posted a fix patch in the other thread by failing such io while > reshape can't make progress. However, I'm not sure for now if this will > break mdadm, for example, will mdadm must read something from array to > make progress? > > Thanks, > Kuai > > . > > >