Il 11/03/2013 18:20, Gleb Natapov ha scritto: > On Mon, Mar 11, 2013 at 03:28:03PM +0100, Paolo Bonzini wrote: >> Il 11/03/2013 14:54, Gleb Natapov ha scritto: >>>> Setting the mp_state to INIT_RECEIVED is that interface, and it already >>>> works, for APs at least. This patch extends it to work for the BSP as well. >>> >>> It does not for AP either. If AP has vmx on mp_sate should not be set to >>> INIT_RECEIVED. mp_sate is a state as you can see from its name and we >>> already had a discussion on the generic device API about importance of >>> separating sending commands from setting state. There is a difference >>> between setting mp_state during migration and signaling INIT#. >> >> What does migration have to do with this? > > get|set_mpstate is used by migration. Actually this is primary reason > for this interface existence. Does it have to be the only one? >>>> In the corresponding userspace patch, I don't need to touch the CPU >>>> state at all. I can just signal the kernel. If I touch the CPU, I'll >>>> break the nested case, no matter how it is implemented. So far, the >>>> userspace did not have to worry about nested, and that's something that >>>> should be kept that way. >>> We are discussing two different things here. I'll try to separate them. >>> 1. BSP is broken WRT #INIT >>> 2. nested is broken WRT #INIT >>> >>> You are fixing 1 with your patches, for that I proposed much easier >>> solution (at last from kernel point of view): if BSP reset it in >>> userspace and make it runnable. Nested virt is still broken, but this is >>> not what you are fixing. >> >> It's not what I'm fixing, but I don't want to make the fix for nested > > What are you fixing then? Nested virt is not what I am fixing, but I'm trying to keep an eye on that (and the other INIT race) while doing these patches. >> virt unnecessarily more complicated. Nested virt needs to know about >> INIT and SIPI; redefining the meaning of INIT_RECEIVED and SIPI_RECEIVED >> makes it more complicated to reflect these events to L1. >> >>> For 2 much more involved fix is needed. Jan fixes it and it will require >>> signaling INIT# from userspace by other means than mp_sate because >>> signaling INIT# does not automatically means that mp_sate changes to >>> INIT_RECEIVED. >> >> In your interpretation of INIT_RECEIVED, no. In mine, yes... > > Your code shows different. With your patch setting mp_state to > INIT_RECEIVED makes vcpu non tunable. This is incorrect if INIT_RECEIVED > is "INIT# is triggered" interface. What do you mean by "non tunable"? In non-nested mode, the VCPU will reset immediately, as soon as it is re-entered. In nested mode, the VCPU will eat the INIT_RECEIVED and turn it into a vmexit. At least according to AMD's docs, the VMM has to reassert INIT if it wants the processor to actually process it [15.20.8 INIT support]. Intel's does not say it explicitly, but it doesn't say the opposite either. It seems to be the only that makes sense. >>>> If we move away from the INIT_RECEIVED and SIPI_RECEIVED states for >>>> in-kernel APIC -> VCPU communication, then the KVM_SET_MP_STATE ioctl >>>> will have to convert them to the right bits in the requests field or in >>>> the APIC state. But I'm starting to see less benefit from moving away >>>> from mp_state. >>>> >>> We are not moving away from mp_state, we are moving away from using >>> mp_state for signaling >> >> That's what I meant; sorry for the unclear abbreviation. > > Then we disagree. We do. Let's see _where_ exactly we disagree. >>> because with nested virt INIT does not always >>> change mp_state >> >> Why not? > > Because mp_state is the current state the vcpu is in. It can be > uninitialized, runnable, halted or wait for sipi. SDM says that > if nested virt is enabled vcpu does not enter wait for sipi state > on INIT#. Yes, but it still has to do something (a vmexit) and go back to RUNNING. So it needs signaling from userspace to the kernel. >> Which is why it's good to have the reset done in kernel space, >> not in user space. > > Without nested virt it does not really matter and if it is does not > really matter you do not add code to the kernel just because it is good. > With nested virt INIT# processing needs to go to the kernel. In some > cases INIT will cause reset, but you do not "do reset in kernel space", > you do "INIT# handling in kernel space". We agree on this. What I add is: let's define the API so that it is nested-friendly. This means having a signaling mechanism for userspace. I think you do not want mp_state to be this signaling mechanism. Why not? Can an existing ioctl be the alternative or do we need to invent a new one? Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html