Hi, all, I have several ASMedia USB 3.x host controllers (ASM2142 and ASM3142, both share the same Vendor ID/Device ID pair) that I'd like to use with a POWER9 system (a Raptor Computing Systems Talos II). Unfortunately, while the kernel recognizes the controllers just fine, as soon as I plug in a device, an EEH error occurs and the host controller gets repeatedly reset until it eventually gets disabled. An example of one of these errors can be seen here: https://paste.debian.net/hidden/e39698eb Based on the "PHB4 Diag-data" reported by the kernel, it seems that LEM_WOF_R bit 35, PHB_FESR bit 20, and RXE_ARB_FESR bit 28 have been set. According to the PHB4 specification (https://ibm.ent.box.com/s/jftnfhceul07qjh9jtn91xwjmclabc71), they respectively mean the following: - ARB: IODA TVT Errors - "TCE Validation Table error occurred. The entry is invalid, or the PCI Address was out of range as defined by the TTA bounds in the TVE entry." - RXE_ARB OR Error Status - "RXE_ARB error bits, ... OR of all error status bits." - IODA TVT Address Range Error - "IODA Error: The PCI Address was out of range as defined by the TTA bounds in the TVE entry." In other words, the ASMedia USB controllers seem to be trying to write to addresses they're not supposed to, and thankfully the PHB4 is catching these bad writes before they can cause any corruption of my system's memory. Of course, this has the unfortunate side-effect that these devices are completely unable to operate with my computer, and since it seems to be possible to use these controllers on x86 systems (presumably because of the less-strict/disabled-by-default IOMMU), I wonder if maybe it would be possible to work around these errors in either the kernel or the OPAL firmware? My thinking is that instead of disconnecting the misbehaving devices, maybe the errors could be "forgiven" (but still blocked) and the device permitted to continue operating, possibly with some USB data loss from "writes to nowhere" or retries that may reduce performance. Or maybe if the issue is caused by some high address bits being set to random values, those bits could be masked-off so as to not trigger the errors and even avoid data loss. So, my question is, is any of this possible? I know the simple solution for me is to just RMA the cards and avoid purchasing ASMedia-based USB host controllers in the future, but the fact that they still seem to work "mostly ok" on x86 systems (with the occasional kernel panics and BSODs reported by users) piques my curiosity and makes me wonder if maybe there's a way for me to have my cheap, buggy hardware cake and eat it, too. Now, I'm a novice at kernel hacking, so I don't really know what I'm doing, but just for fun I did try to paper over the issue by adding an EEH handler to the xhci driver (https://paste.debian.net/hidden/16081515), but as you might expect, that didn't do anything but prevent further communication with the device. I also read a bunch of the PHB4 and IODA2 specs to see if maybe there'd be a way to implement that bit-masking thing I mentioned, but both of those documents are, uh, rather dry reading, so I haven't read them in their entirety, and I don't know enough about how this all works to try to search the text for what I need. All that said, if anyone has any suggestions or comments, I'd be really interested to hear them, even if it's just to question why I'd go to such ridiculous lengths to try to get software to account for buggy hardware. All the best, Forest