RE: [RFC PATCH v1] powerpc/prom_init: disable XIVE in Secure VM.

Ram Pai <linuxram@xxxxxxxxxx> · Sat, 29 Feb 2020 14:51:40 -0800

On Sat, Feb 29, 2020 at 09:27:54AM +0100, Cédric Le Goater wrote:
> On 2/29/20 8:54 AM, Ram Pai wrote:
> > XIVE is not correctly enabled for Secure VM in the KVM Hypervisor yet.
> > 
> > Hence Secure VM, must always default to XICS interrupt controller.
> 
> have you tried XIVE emulation 'kernel-irqchip=off' ? 

yes and it hangs. I think that option, continues to enable some variant
of XIVE in the VM.  There are some known deficiencies between KVM
and the ultravisor negotiation, resulting in a hang in the SVM.

> 
> > If XIVE is requested through kernel command line option "xive=on",
> > override and turn it off.
> 
> This is incorrect. It is negotiated through CAS depending on the FW
> capabilities and the KVM capabilities.

Yes I understand, qemu/KVM have predetermined a set of capabilties that
it can offer to the VM.  The kernel within the VM has a list of
capabilties it needs to operate correctly.  So both negotiate and
determine something mutually ammicable.

Here I am talking about the list of capabilities that the kernel is
trying to determine, it needs to operate correctly.  "xive=on" is one of
those capabilities the kernel is told by the VM-adminstrator, to enable.
Unfortunately if the VM-administrtor blindly requests to enable it, the
kernel must override it, if it knows that will be switching the VM into
a SVM soon. No point negotiating a capability with Qemu; through CAS,
if it knows it cannot handle that capability.

> 
> > If XIVE is the only supported platform interrupt controller; specified
> > through qemu option "ic-mode=xive", simply abort. Otherwise default to
> > XICS.
> 
> 
> I don't think it is a good approach to downgrade the guest kernel 
> capabilities this way. 
> 
> PAPR has specified the CAS negotiation process for this purpose. It 
> comes in two parts under KVM. First the KVM hypervisor advertises or 
> not a capability to QEMU. The second is the CAS negotiation process 
> between QEMU and the guest OS.

Unfortunately, this is not viable.  At the time the hypervisor
advertises its capabilities to qemu, the hypervisor has no idea whether
that VM will switch into a SVM or not.  The decision to switch into a
SVM is taken by the kernel running in the VM. This happens much later,
after the hypervisor has already conveyed its capabilties to the qemu, and
qemu has than instantiated the VM.

As a result, CAS in prom_init is the only place where this negotiation
can take place.

> 
> The SVM specifications might not be complete yet and if some features 
> are incompatible, I think we should modify the capabilities advertised 
> by the hypervisor : no XIVE in case of SVM. QEMU will automatically 
> use the fallback path and emulate the XIVE device, same as setting 
> 'kernel-irqchip=off'. 

As mentioned above, this would be an excellent approach, if the
Hypervisor was aware of the VM's intent to switch into a SVM.  Neither
the hypervisor knows, nor the qemu.  Only the kernel running within the
VM knows about it.

Do you still think, my approach is wrong?
RP