Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 25, 2011 at 02:21:01PM +0200, Jan Kiszka wrote:
> On 2011-10-25 14:05, Michael S. Tsirkin wrote:
> > On Tue, Oct 25, 2011 at 01:41:39PM +0200, Jan Kiszka wrote:
> >> On 2011-10-25 13:20, Michael S. Tsirkin wrote:
> >>> On Tue, Oct 25, 2011 at 09:24:17AM +0200, Jan Kiszka wrote:
> >>>> On 2011-10-24 19:23, Michael S. Tsirkin wrote:
> >>>>> On Mon, Oct 24, 2011 at 07:05:08PM +0200, Michael S. Tsirkin wrote:
> >>>>>> On Mon, Oct 24, 2011 at 06:10:28PM +0200, Jan Kiszka wrote:
> >>>>>>> On 2011-10-24 18:05, Michael S. Tsirkin wrote:
> >>>>>>>>> This is what I have in mind:
> >>>>>>>>>  - devices set PBA bit if MSI message cannot be sent due to mask (*)
> >>>>>>>>>  - core checks&clears PBA bit on unmask, injects message if bit was set
> >>>>>>>>>  - devices clear PBA bit if message reason is resolved before unmask (*)
> >>>>>>>>
> >>>>>>>> OK, but practically, when exactly does the device clear PBA?
> >>>>>>>
> >>>>>>> Consider a network adapter that signals messages in a RX ring: If the
> >>>>>>> corresponding vector is masked while the guest empties the ring, I
> >>>>>>> strongly assume that the device is supposed to take back the pending bit
> >>>>>>> in that case so that there is no interrupt inject on a later vector
> >>>>>>> unmask operation.
> >>>>>>>
> >>>>>>> Jan
> >>>>>>
> >>>>>> Do you mean virtio here?
> >>>>
> >>>> Maybe, but I'm also thinking of fully emulated devices.
> >>>
> >>> One thing seems certain: actual, assigned devices don't
> >>> have this fake "msi-x level" so they don't notify host
> >>> when that changes.
> >>
> >> But they have real PBA. We "just" need to replicate the emulated vector
> >> mask state into real hw. Doesn't this happen anyway when we disable the
> >> IRQ on the host?
> > 
> > Not immediately I think.
> > 
> >> If not, that may require a bit more work, maybe a special masking mode
> >> that can be requested by the managing backend of an assigned device from
> >> the MSI-X in-kernel service.
> > 
> > True. OTOH this might have cost (extra mmio) for the
> > doubtful benefit of making PBA values exact.
> 
> I think correctness come before performance unless the latter hurts
> significantly.
> 
> > 
> >>>
> >>>>> Do you expect this optimization to give
> >>>>>> a significant performance gain?
> >>>>
> >>>> Hard to asses in general. But I have a silly guest here that obviously
> >>>> masks MSI vectors for each event. This currently not only kicks us into
> >>>> a heavy-weight exit, it also enforces serialization on qemu_global_mutex
> >>>> (while we have the rest already isolated).
> >>>
> >>> It easy to see how MSIX mask support in kernel would help.
> >>> Not sure whether it's worth it to also add special APIs to
> >>> reduce the number of spurious interrupts for such silly guests.
> >>
> >> I do not get the latter point. What could be simplified (without making
> >> it incorrect) when ignoring excessive mask accesses?
> > 
> > Clearing PBA when we detect an empty ring in host is not required,
> > IMO. It's an optimization.
> 
> For virtio that might be true - as we are free to define the device
> behaviour to our benefit. What emulated real devices do is another thing.

Anything specific in mind?

> > 
> >> Also, if "sane"
> >> guests do not access the mask that frequently, why was in-kernel MSI-X
> >> MMIO proposed at all?
> > 
> > Apparently whether mask accesses happen a lot depends on the workload.
> > 
> >>>
> >>>>>
> >>>>> It would also be challenging to implement this in
> >>>>> a race free manner. Clearing on interrupt status read
> >>>>> seems straight-forward.
> >>>>
> >>>> With an in-kernel MSI-X MMIO handler, this race will be naturally
> >>>> unavoidable as there is no more global lock shared between table/PBA
> >>>> accesses and the device model. But, when using atomic bit ops, I don't
> >>>> think that will cause headache.
> >>>>
> >>>> Jan
> >>>
> >>> This is not the race I meant.  The challenge is for the device to
> >>> determine that it can clear the PBA.  atomic accesses on PBA won't help
> >>> here I think.
> >>
> >> The device knows best if the interrupt reason persists.
> > 
> > It might not know this unless notified by driver.
> > E.g. virtio drivers currently don't do interrupt status
> > reads.
> 
> Talking about real devices, they obviously know as they maintain the
> hardware state.

Not necessarily. It's quite common to keep the ring in coherent memory
allocated by driver, not within the device, the state is then
maintained by driver and device together.

> > 
> >> It can
> >> synchronize MSI assertion and PBA bit clearance. If it clears "too
> >> late", than this reflects what may happen on real hw as well when host
> >> and device race for changing vector mask vs. device state. It's not
> >> stated that those changes need to be serialized inside the device, is it?
> >>
> >> Jan
> > 
> > Talking about emulated devices?  It's not sure that real
> > hardware clears PBA. Considering that no guests I know of use PBA ATM,
> > I would not be surprised if many devices had broken PBA support.
> 
> OK, if there are no conforming MSI-X devices out there,

Oh, I'm guessing some devices are conforming :)

> then we can
> forget about all the PBA maintenance beyond "set if message hit mask,
> cleared again on unmask". But I doubt that this is generally true.
> 
> Jan

We seem to get by basically with what you describe but I'm not
saying it's perfect, just that it's hard to make it perfect.

> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux