On 11/21/2022 1:55 PM, Keith Busch wrote: > On Thu, Nov 17, 2022 at 02:22:10PM -0700, Jonathan Derrick wrote: >> I seem to have isolated the error mechanism for older kernels, but 6.2.0-rc2 >> reliably segfaults my QEMU instance (something else to look into) and I don't >> have any 'real' hardware to test this on at the moment. It looks like several >> passthru commands are able to enqueue prior/during/after resetting/connecting. > > I'm not seeing any problem with the latest nvme-qemu after several dozen > iterations of this test case. In that environment, the formats and > resets complete practically synchronously with the call, so everything > proceeds quickly. Is there anything special I need to change? > I can still repro this with nvme-fixes tag, so I'll have to dig into it myself Does the tighter loop in the test comment header produce results? >> The issue seems to be very heavily timing related, so the loop in the header is >> a lot more forceful in this approach. >> >> As far as the loop goes, I've noticed it will typically repro immediately or >> pass the whole test. > > I can only get possible repro in scenarios that have multi-second long, > serialized format times. Even then, it still appears that everything > fixes itself after a waiting. Are you observing the same, or is it stuck > forever in your observations? In 5.19, it gets stuck forever with lots of formats outstanding and controller stuck in resetting. I'll keep digging. Thanks Keith > >> +remove_and_rescan() { >> + local pdev=$1 >> + echo 1 > /sys/bus/pci/devices/"$pdev"/remove >> + echo 1 > /sys/bus/pci/rescan >> +} > > This function isn't called anywhere.