On Wed, Mar 9, 2022 at 3:04 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
On Wed, Mar 09, 2022 at 11:41:01AM +0800, Jason Wang wrote:
>
>
> On Wed, Mar 9, 2022 at 12:36 AM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
>
> On Tue, Mar 08, 2022 at 03:19:16PM +0000, Marc Zyngier wrote:
> > On Tue, 19 Oct 2021 08:01:46 +0100,
> > Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > >
> > > We used to synchronize pending MSI-X irq handlers via
> > > synchronize_irq(), this may not work for the untrusted device which
> > > may keep sending interrupts after reset which may lead unexpected
> > > results. Similarly, we should not enable MSI-X interrupt until the
> > > device is ready. So this patch fixes those two issues by:
> > >
> > > 1) switching to use disable_irq() to prevent the virtio interrupt
> > > handlers to be called after the device is reset.
> > > 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready()
> > >
> > > This can make sure the virtio interrupt handler won't be called before
> > > virtio_device_ready() and after reset.
> > >
> > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > > Cc: Paul E. McKenney <paulmck@xxxxxxxxxx>
> > > Signed-off-by: Jason Wang <jasowang@xxxxxxxxxx>
> > > ---
> > > drivers/virtio/virtio_pci_common.c | 27 +++++++++++++++++++++------
> > > drivers/virtio/virtio_pci_common.h | 6 ++++--
> > > drivers/virtio/virtio_pci_legacy.c | 5 +++--
> > > drivers/virtio/virtio_pci_modern.c | 6 ++++--
> > > 4 files changed, 32 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/
> virtio_pci_common.c
> > > index b35bb2d57f62..8d8f83aca721 100644
> > > --- a/drivers/virtio/virtio_pci_common.c
> > > +++ b/drivers/virtio/virtio_pci_common.c
> > > @@ -24,8 +24,8 @@ MODULE_PARM_DESC(force_legacy,
> > > "Force legacy mode for transitional virtio 1 devices");
> > > #endif
> > >
> > > -/* wait for pending irq handlers */
> > > -void vp_synchronize_vectors(struct virtio_device *vdev)
> > > +/* disable irq handlers */
> > > +void vp_disable_cbs(struct virtio_device *vdev)
> > > {
> > > struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > int i;
> > > @@ -34,7 +34,20 @@ void vp_synchronize_vectors(struct virtio_device
> *vdev)
> > > synchronize_irq(vp_dev->pci_dev->irq);
> > >
> > > for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > - synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > + disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> > > +}
> > > +
> > > +/* enable irq handlers */
> > > +void vp_enable_cbs(struct virtio_device *vdev)
> > > +{
> > > + struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > + int i;
> > > +
> > > + if (vp_dev->intx_enabled)
> > > + return;
> > > +
> > > + for (i = 0; i < vp_dev->msix_vectors; ++i)
> > > + enable_irq(pci_irq_vector(vp_dev->pci_dev, i));
> >
> > This results in a splat at boot time if you set maxcpus=<whatever>,
> > see below. Enabling interrupts that are affinity managed is *bad*. You
> > don't even know whether the CPU which is supposed to handle this is
> > online or not.
> >
> > The core kernel notices it, shouts and keeps the interrupt disabled,
> > but this should be fixed. The whole point of managed interrupts is to
> > let them be dealt with outside of the drivers, and tied into the CPUs
> > being brought up and down. If virtio needs (for one reason or another)
> > to manage interrupts on its own, so be it. But this patch isn't the
> > way to do it, I'm afraid.
> >
> > M.
>
> Thanks for reporting this!
>
> What virtio is doing here isn't unique however.
>
> If device driver is to be hardened against device sending interrupts
> while driver is initializing/cleaning up, it needs kernel to provide
> capability to disable these.
>
> If we then declare that that is impossible for managed interrupts
> then that will mean most devices can't use managed interrupts
> because ideally we'd have all drivers hardened.
>
>
> Or that probably means we need to use a driver specific mechanism as what we
> did for INTX instead of using NO_AUTO_EN.
>
> Thanks
What we did for INTX was pretty expensive, we justified this
by saying INTX handling involves expensive IO reads anyway
so it's in the noise.
We only add a READ_ONCE() on the callback like:
+ if (!READ_ONCE(vp_dev->intx_soft_enabled))
+ return IRQ_NONE;
+ return IRQ_NONE;
It should not be that expensive.
Let's see what does Thomas suggest.
Yes, sure.
Thanks
>
>
> Thomas I think you were the one who suggested enabling/disabling
> interrupts originally - thoughts?
>
> Feedback appreciated.
>
>
>
> > [ 3.434849] ------------[ cut here ]------------
> > [ 3.434850] WARNING: CPU: 0 PID: 93 at kernel/irq/chip.c:210
> irq_startup+0x10
> > e/0x120
> > [ 3.434861] Modules linked in: virtio_net(E+) net_failover(E) failover
> (E) vir
> > tio_blk(E+) bochs(E+) drm_vram_helper(E) drm_ttm_helper(E) ttm(E) ahci
> (E+) libah
> > ci(E) virtio_pci(E) virtio_pci_legacy_dev(E) virtio_pci_modern_dev(E)
> virtio(E)
> > drm_kms_helper(E) cec(E) libata(E) crct10dif_pclmul(E) crct10dif_common
> (E) crc32
> > _pclmul(E) scsi_mod(E) i2c_i801(E) crc32c_intel(E) psmouse(E) i2c_smbus
> (E) scsi_
> > common(E) lpc_ich(E) virtio_ring(E) drm(E) button(E)
> > [ 3.434890] CPU: 0 PID: 93 Comm: systemd-udevd Tainted: G
> E 5.
> > 17.0-rc7-00020-gea4424be1688 #63
> > [ 3.434893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> 0.0.0 02
> > /06/2015
> > [ 3.434897] RIP: 0010:irq_startup+0x10e/0x120
> > [ 3.434904] Code: c0 75 2b 4c 89 e7 31 d2 4c 89 ee e8 dc c5 ff ff 48
> 89 ef e8
> > 94 fe ff ff 41 89 c4 e9 33 ff ff ff e8 e7 ca ff ff e9 50 ff ff ff <0f>
> 0b eb ac
> > 0f 0b eb a8 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
> > [ 3.434906] RSP: 0018:ffff972c402bbbf0 EFLAGS: 00010002
> > [ 3.434908] RAX: 0000000000000004 RBX: 0000000000000001 RCX:
> 0000000000000040
> > [ 3.434912] RDX: 0000000000000000 RSI: ffffffffa768dee0 RDI:
> ffff8bcf8ce34648
> > [ 3.434913] RBP: ffff8bcfb007a800 R08: 0000000000000004 R09:
> ffffffffa74cb828
> > [ 3.434915] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000001
> > [ 3.434916] R13: ffff8bcf8ce34648 R14: ffff8bcf8d185c70 R15:
> 0000000000000200
> > [ 3.434918] FS: 00007f5b3179f8c0(0000) GS:ffff8bcffbc00000(0000)
> knlGS:00000
> > 00000000000
> > [ 3.434919] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 3.434921] CR2: 000055ca47bab6b8 CR3: 000000017bc40003 CR4:
> 0000000000170ef0
> > [ 3.434928] Call Trace:
> > [ 3.434936] <TASK>
> > [ 3.434938] enable_irq+0x48/0x90
> > [ 3.434943] vp_enable_cbs+0x36/0x70 [virtio_pci]
> > [ 3.434948] virtblk_probe+0x457/0x7dc [virtio_blk]
> > [ 3.434954] virtio_dev_probe+0x1ae/0x280 [virtio]
> > [ 3.434959] really_probe+0x1f5/0x3d0
> > [ 3.434966] __driver_probe_device+0xfe/0x180
> > [ 3.434969] driver_probe_device+0x1e/0x90
> > [ 3.434971] __driver_attach+0xc0/0x1c0
> > [ 3.434974] ? __device_attach_driver+0xe0/0xe0
> > [ 3.434976] ? __device_attach_driver+0xe0/0xe0
> > [ 3.434978] bus_for_each_dev+0x78/0xc0
> > [ 3.434982] bus_add_driver+0x149/0x1e0
> > [ 3.434985] driver_register+0x8b/0xe0
> > [ 3.434987] ? 0xffffffffc01aa000
> > [ 3.434990] init+0x52/0x1000 [virtio_blk]
> > [ 3.434994] do_one_initcall+0x44/0x200
> > [ 3.435001] ? kmem_cache_alloc_trace+0x300/0x400
> > [ 3.435006] do_init_module+0x4c/0x260
> > [ 3.435013] __do_sys_finit_module+0xb4/0x120
> > [ 3.435018] do_syscall_64+0x3b/0xc0
> > [ 3.435027] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 3.435037] RIP: 0033:0x7f5b31c589b9
> > [ 3.435040] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
> 48 89 f8
> > 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
> 3d 01 f0
> > ff ff 73 01 c3 48 8b 0d a7 54 0c 00 f7 d8 64 89 01 48
> > [ 3.435042] RSP: 002b:00007ffc608fc198 EFLAGS: 00000246 ORIG_RAX:
> 00000000000
> > 00139
> > [ 3.435045] RAX: ffffffffffffffda RBX: 000055ca47ba8700 RCX:
> 00007f5b31c589b9
> > [ 3.435046] RDX: 0000000000000000 RSI: 00007f5b31de3e2d RDI:
> 0000000000000005
> > [ 3.435048] RBP: 0000000000020000 R08: 0000000000000000 R09:
> 000055ca47ba9030
> > [ 3.435049] R10: 0000000000000005 R11: 0000000000000246 R12:
> 00007f5b31de3e2d
> > [ 3.435050] R13: 0000000000000000 R14: 000055ca47ba7060 R15:
> 000055ca47ba8700
> > [ 3.435053] </TASK>
> > [ 3.435059] ---[ end trace 0000000000000000 ]---
> > [ 3.440593] vda: vda1 vda2 vda3
> > [ 3.445283] scsi host0: Virtio SCSI HBA
> > [ 3.450373] scsi 0:0:0:0: CD-ROM QEMU QEMU CD-ROM
> 2.5+ PQ
> > : 0 ANSI: 5
> >
> > --
> > Without deviation from the norm, progress is not possible.
>
>
_______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization