On Thu, Nov 17, 2022 at 02:22:10PM -0700, Jonathan Derrick wrote: > I seem to have isolated the error mechanism for older kernels, but 6.2.0-rc2 > reliably segfaults my QEMU instance (something else to look into) and I don't > have any 'real' hardware to test this on at the moment. It looks like several > passthru commands are able to enqueue prior/during/after resetting/connecting. I'm not seeing any problem with the latest nvme-qemu after several dozen iterations of this test case. In that environment, the formats and resets complete practically synchronously with the call, so everything proceeds quickly. Is there anything special I need to change? > The issue seems to be very heavily timing related, so the loop in the header is > a lot more forceful in this approach. > > As far as the loop goes, I've noticed it will typically repro immediately or > pass the whole test. I can only get possible repro in scenarios that have multi-second long, serialized format times. Even then, it still appears that everything fixes itself after a waiting. Are you observing the same, or is it stuck forever in your observations? > +remove_and_rescan() { > + local pdev=$1 > + echo 1 > /sys/bus/pci/devices/"$pdev"/remove > + echo 1 > /sys/bus/pci/rescan > +} This function isn't called anywhere.