Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

Jan Kiszka <jan.kiszka@xxxxxxxxxxx> · Tue, 25 Oct 2011 13:41:39 +0200

On 2011-10-25 13:20, Michael S. Tsirkin wrote:
> On Tue, Oct 25, 2011 at 09:24:17AM +0200, Jan Kiszka wrote:
>> On 2011-10-24 19:23, Michael S. Tsirkin wrote:
>>> On Mon, Oct 24, 2011 at 07:05:08PM +0200, Michael S. Tsirkin wrote:
>>>> On Mon, Oct 24, 2011 at 06:10:28PM +0200, Jan Kiszka wrote:
>>>>> On 2011-10-24 18:05, Michael S. Tsirkin wrote:
>>>>>>> This is what I have in mind:
>>>>>>>  - devices set PBA bit if MSI message cannot be sent due to mask (*)
>>>>>>>  - core checks&clears PBA bit on unmask, injects message if bit was set
>>>>>>>  - devices clear PBA bit if message reason is resolved before unmask (*)
>>>>>>
>>>>>> OK, but practically, when exactly does the device clear PBA?
>>>>>
>>>>> Consider a network adapter that signals messages in a RX ring: If the
>>>>> corresponding vector is masked while the guest empties the ring, I
>>>>> strongly assume that the device is supposed to take back the pending bit
>>>>> in that case so that there is no interrupt inject on a later vector
>>>>> unmask operation.
>>>>>
>>>>> Jan
>>>>
>>>> Do you mean virtio here?
>>
>> Maybe, but I'm also thinking of fully emulated devices.
> 
> One thing seems certain: actual, assigned devices don't
> have this fake "msi-x level" so they don't notify host
> when that changes.

But they have real PBA. We "just" need to replicate the emulated vector
mask state into real hw. Doesn't this happen anyway when we disable the
IRQ on the host?

If not, that may require a bit more work, maybe a special masking mode
that can be requested by the managing backend of an assigned device from
the MSI-X in-kernel service.

> 
>>> Do you expect this optimization to give
>>>> a significant performance gain?
>>
>> Hard to asses in general. But I have a silly guest here that obviously
>> masks MSI vectors for each event. This currently not only kicks us into
>> a heavy-weight exit, it also enforces serialization on qemu_global_mutex
>> (while we have the rest already isolated).
> 
> It easy to see how MSIX mask support in kernel would help.
> Not sure whether it's worth it to also add special APIs to
> reduce the number of spurious interrupts for such silly guests.

I do not get the latter point. What could be simplified (without making
it incorrect) when ignoring excessive mask accesses? Also, if "sane"
guests do not access the mask that frequently, why was in-kernel MSI-X
MMIO proposed at all?

> 
>>>
>>> It would also be challenging to implement this in
>>> a race free manner. Clearing on interrupt status read
>>> seems straight-forward.
>>
>> With an in-kernel MSI-X MMIO handler, this race will be naturally
>> unavoidable as there is no more global lock shared between table/PBA
>> accesses and the device model. But, when using atomic bit ops, I don't
>> think that will cause headache.
>>
>> Jan
> 
> This is not the race I meant.  The challenge is for the device to
> determine that it can clear the PBA.  atomic accesses on PBA won't help
> here I think.

The device knows best if the interrupt reason persists. It can
synchronize MSI assertion and PBA bit clearance. If it clears "too
late", than this reflects what may happen on real hw as well when host
and device race for changing vector mask vs. device state. It's not
stated that those changes need to be serialized inside the device, is it?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html