On Thu, Aug 08, 2024 at 08:10:34AM -0400, Michael S. Tsirkin wrote: > On Thu, Aug 08, 2024 at 10:51:41AM +0300, Kirill A. Shutemov wrote: > > Hongyu reported a hang on kexec in a VM. QEMU reported invalid memory > > accesses during the hang. > > > > Invalid read at addr 0x102877002, size 2, region '(null)', reason: rejected > > Invalid write at addr 0x102877A44, size 2, region '(null)', reason: rejected > > ... > > > > It was traced down to virtio-console. Kexec works fine if virtio-console > > is not in use. > > virtio is not doing a lot of 16 bit reads. > Are these the reads: > > virtio_cread(vdev, struct virtio_console_config, cols, &cols); > virtio_cread(vdev, struct virtio_console_config, rows, &rows); > > ? > > write is a bit puzzling too. This one? > > bool vp_notify(struct virtqueue *vq) > { > /* we write the queue's selector into the notification register to > * signal the other end */ > iowrite16(vq->index, (void __iomem *)vq->priv); > return true; > } Given that we are talking about console issue, any suggestion on how to check? > > > > Looks like virtio-console continues to write to the MMIO even after > > underlying virtio-pci device is removed. > > You mention both MMIO and pci, I am confused. By MMIO, I mean accesses to PCI BARs. But it is only my *guess* on the situation, I have limited knowledge of the area. I am not drivers guy. > Removed by what? In what sense? So device_shutdown() iterates over all device and we hit the problem when we get to virtio-pci devices and call pci_device_shutdown on them. I *think* PCI BAR (or something else?) becomes unavailable after that but it is still accessed. > > > > The problem can be mitigated by removing all virtio devices on virtio > > bus shutdown. > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > > Reported-by: Hongyu Ning <hongyu.ning@xxxxxxxxxxxxxxx> > > A bit worried about doing so much activity on shutdown, > and for all devices, too. I'd like to understand what > is going on a bit better - could be a symptom of > a bigger problem (e.g. missing handling for suprise > removal?). I probably should have marked the patch as RFC. The patch was intended to start conversation. I am not sure it is correct. This patch just happened to work in our setup. -- Kiryl Shutsemau / Kirill A. Shutemov