I won't buy a marvell. I had some sort of 92xx variant (2 pci-e lanes) and it had a bad habit of stopping working under load and causing all 4 ports to go offline. Board maker blamed the driver except the driver is/was the generic AHCI so unlikely to be the issue since all other ports were also AHCI and were just fine. Best luck is a used LSI SAS controller. You can get 8 ports off of one, but you may need a breakout cable to use for 4 sata devices. On Fri, Jun 23, 2023 at 2:17 PM David Gilmour <dgilmour76@xxxxxxxxx> wrote: > > I wanted to provide an update on this thread. First of all thank you > for all the insights and recommendations. I finally found a way to > recover my data and wanted to pass what the fix was in the event > someone stumbles across this exact scenario. Summary below > - I believe there is some kind of problem with kernel or module in > 5.14.0-319.el9.x86_64 for my controller (ASMedia ASM1064 chipset) > which I believe was responsible for the drives attached to it > disappearing while my grow from raid 5 to raid 6 was taking place > - After the above event (and rebooting) whenever I tried to assemble > the raid to kick off resuming the rebuild mdadm would hang as > previously described in this thread. > - After Yu pointed me to a patch that might of bypass the issue I > decided to first boot the system on a rescue disk with an older kernel > (3.x) and mdadm version > - Fortunately, my assemble succeeded and the grow resumed and the > slow rebuild of my 30TB array completed 17 days later > - My ASMedia ASM1064 chipset controller was 100% stable for the 17 > days of rebuild on the old kernel > - As soon as I went back to my 5.14.0-319.el9.x86_64 kernel my > ASMedia ASM1064 controller started showing ata timeout errors and > drives disappearing again > - I ended up just purchasing another controller with a different > chipset (Marvell 88SE9215) out of desperation and the system is > finally stable and my data is all intact! > > Again thank you everyone for the help! > > --David > > > On Mon, May 8, 2023 at 8:33 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > > > Hi, > > > > 在 2023/05/09 6:53, Roger Heflin 写道: > > > On Mon, May 8, 2023 at 6:57 AM David Gilmour <dgilmour76@xxxxxxxxx> wrote: > > >> > > >> Ok, well I'm willing to try anything at this point. Do you need > > >> anything from me for a patch? Here is my current kernel details: > > > > > > grep -i mdadm /etc/udev/rules.d/* /lib/udev/rules.d/* > > > > > > If you can find a udev rule that starts up the monitor then move that > > > rule out of the directory, so that on the next assemble try it does > > > not get started. > > > > > > If this is the recent bug that is being discussed then anything > > > accessing the array after the reshape will deadlock the array and the > > > reshape. > > > > It's not anything accessing the array, in fact, it's only the io accross > > reshape position can trigger the deadlock. > > > > I just posted a fix patch in the other thread by failing such io while > > reshape can't make progress. However, I'm not sure for now if this will > > break mdadm, for example, will mdadm must read something from array to > > make progress? > > > > Thanks, > > Kuai > > > . > > > > >