Re: [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode

Cédric Le Goater <clg@xxxxxxxx> · Tue, 29 Jan 2019 14:51:05 +0100

>>> Another general comment is that you seem to have written all this
>>> code assuming we are using HV KVM in a host running bare-metal.
>>
>> Yes. I didn't look at the other configurations. I thought that we could
>> use the kernel_irqchip=off option to begin with. A couple of checks
>> are indeed missing.
> 
> Using kernel_irqchip=off would mean that we would not be able to use
> the in-kernel XICS emulation, which would have a performance impact.

yes. But it is not supported today. Correct ? 

> We need an explicit capability for XIVE exploitation that can be
> enabled or disabled on the qemu command line, so that we can enforce a
> uniform set of capabilities across all the hosts in a migration
> domain.  And it's no good to say we have the capability when all
> attempts to use it will fail.  Therefore the kernel needs to say that
> it doesn't have the capability in a PR KVM guest or in a nested HV
> guest.

OK. I will work on adding a KVM_CAP_PPC_NESTED_IRQ_HV capability 
for future use.

>>> However, we could be using PR KVM (either in a bare-metal host or in a
>>> guest), or we could be doing nested HV KVM where we are using the
>>> kvm_hv module inside a KVM guest and using special hypercalls for
>>> controlling our guests.
>>
>> Yes. 
>>
>> It would be good to talk a little about the nested support (offline 
>> may be) to make sure that we are not missing some major interface that 
>> would require a lot of change. If we need to prepare ground, I think
>> the timing is good.
>>
>> The size of the IRQ number space might be a problem. It seems we 
>> would need to increase it considerably to support multiple nested 
>> guests. That said I haven't look much how nested is designed.  
> 
> The current design of nested HV is that the entire non-volatile state
> of all the nested guests is encapsulated within the state and
> resources of the L1 hypervisor.  That means that if the L1 hypervisor
> gets migrated, all of its guests go across inside it and there is no
> extra state that L0 needs to be aware of.  That would imply that the
> VP number space for the nested guests would need to come from within
> the VP number space for L1; but the amount of VP space we allocate to
> each guest doesn't seem to be large enough for that to be practical.

If the KVM XIVE device had some information on the max number of CPUs 
provisioned for the guest, we could optimize the VP allocation.

That might be a larger KVM topic though. There are some static limits 
on the number of CPUs in QEMU and in KVM, which have no relation AFAICT. 

Thanks,

C.