On Thu, Jun 03, 2021 at 10:13:30AM +0200, Claudio Fontana wrote: > On 6/1/21 8:48 PM, Eduardo Habkost wrote: > > +Vitaly > > > > On Sat, May 29, 2021 at 11:13:12AM +0200, Claudio Fontana wrote: > >> we need to expand features first, before we attempt to check them > >> in the accel realizefn code called by cpu_exec_realizefn(). > >> > >> At the same time we need checks for code_urev and host_cpuid_required, > >> and modifications to cpu->mwait to happen after the initial setting > >> of them inside the accel realizefn code. > > > > I miss an explanation why those 3 steps need to happen after > > accel realizefn. > > > > I'm worried by the fragility of the ordering. If there are > > specific things that must be done before/after > > cpu_exec_realizefn(), this needs to be clear in the code. > > Hi Eduardo, > > indeed the initialization and realization code for x86 appears to be very sensitive to ordering. > This continues to be the case after the fix. > > cpu_exec_realizefn > > > > > > >> > >> Partial Fix. > >> > >> Fixes: 48afe6e4eabf ("i386: split cpu accelerators from cpu.c, using AccelCPUClass") > > > > Shouldn't this be > > f5cc5a5c1686 ("i386: split cpu accelerators from cpu.c, using AccelCPUClass") > > ? > > > > Also, it looks like part of the ordering change was made in > > commit 30565f10e907 ("cpu: call AccelCPUClass::cpu_realizefn in > > cpu_exec_realizefn"). > > > > > > > >> Signed-off-by: Claudio Fontana <cfontana@xxxxxxx> > >> --- > >> target/i386/cpu.c | 56 +++++++++++++++++++++++------------------------ > >> 1 file changed, 28 insertions(+), 28 deletions(-) > >> > >> diff --git a/target/i386/cpu.c b/target/i386/cpu.c > >> index 9e211ac2ce..6bcb7dbc2c 100644 > >> --- a/target/i386/cpu.c > >> +++ b/target/i386/cpu.c > >> @@ -6133,34 +6133,6 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp) > >> Error *local_err = NULL; > >> static bool ht_warned; > >> > >> - /* Process Hyper-V enlightenments */ > >> - x86_cpu_hyperv_realize(cpu); > > > > Vitaly, is this reordering going to affect the Hyper-V cleanup > > work you are doing? It seems harmless and it makes sense to keep > > the "realize" functions close together, but I'd like to confirm. > > > >> - > >> - cpu_exec_realizefn(cs, &local_err); > >> - if (local_err != NULL) { > >> - error_propagate(errp, local_err); > >> - return; > >> - } > >> - > >> - if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) { > >> - g_autofree char *name = x86_cpu_class_get_model_name(xcc); > >> - error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name); > >> - goto out; > >> - } > >> - > >> - if (cpu->ucode_rev == 0) { > >> - /* The default is the same as KVM's. */ > >> - if (IS_AMD_CPU(env)) { > >> - cpu->ucode_rev = 0x01000065; > >> - } else { > >> - cpu->ucode_rev = 0x100000000ULL; > >> - } > >> - } > >> - > >> - /* mwait extended info: needed for Core compatibility */ > >> - /* We always wake on interrupt even if host does not have the capability */ > >> - cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE; > >> - > >> if (cpu->apic_id == UNASSIGNED_APIC_ID) { > >> error_setg(errp, "apic-id property was not initialized properly"); > >> return; > >> @@ -6190,6 +6162,34 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp) > >> & CPUID_EXT2_AMD_ALIASES); > >> } > >> > >> + /* Process Hyper-V enlightenments */ > >> + x86_cpu_hyperv_realize(cpu); > >> + > >> + cpu_exec_realizefn(cs, &local_err); > > > > I'm worried by the reordering of cpu_exec_realizefn(). That > > function does a lot, and reordering it might have even more > > unwanted side effects. > > > > I wonder if it wouldn't be easier to revert commit 30565f10e907 > > ("cpu: call AccelCPUClass::cpu_realizefn in cpu_exec_realizefn"). > > the part that is wrong in that commit does not I think have to do with where the accel_cpu::cpu_realizefn() is called, but rather > the move of the call to cpu_exec_realizefn (now including the accel_cpu::cpu_realizefn) to the very beginning of the function. Oh, I didn't notice commit 30565f10e907 also moved the cpu_exec_realizefn() call to the beginning of the function. So moving it back (like you do her) actually seems to be a good idea. > > This was done due to the fact that the accel-specific code needs to be called before the code: > > * if (cpu->ucode_rev == 0) { > > (meaning "use default if nothing accelerator specific has been set"), as this could be set by accel-specific code, > > * cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE; > > (as the mwait is written to by cpu "host" kvm/hvf-specific code when enabling cpu pm), > > * if (cpu->phys_bits && ... > > (as the phys bits can be set by calling the host CPUID) > > But I missed there that we cannot move the call before the expansion of cpu features, > as the accel realize code checks and enables additional features assuming expansion has already happened. > > This was what was breaking the cpu "host" phys bits, even after correcting the cpu instance initialization order, > as the KVM/HVF -specific code would do the adjustment of phys bits to match the host only if: > > * if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) { > > > > > > > > >> + if (local_err != NULL) { > >> + error_propagate(errp, local_err); > >> + return; > >> + } > >> + > >> + if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) { > >> + g_autofree char *name = x86_cpu_class_get_model_name(xcc); > >> + error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name); > >> + goto out; > >> + } > >> + > >> + if (cpu->ucode_rev == 0) { > >> + /* The default is the same as KVM's. */ > >> + if (IS_AMD_CPU(env)) { > >> + cpu->ucode_rev = 0x01000065; > >> + } else { > >> + cpu->ucode_rev = 0x100000000ULL; > >> + } > >> + } > >> + > >> + /* mwait extended info: needed for Core compatibility */ > >> + /* We always wake on interrupt even if host does not have the capability */ > >> + cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE; > >> + > > > > The dependency between those lines and cpu_exec_realizefn() is > > completely unclear here. What can we do to make this clearer and > > less fragile? > > Should I add something similar to my comment above? > > There _is_ something already in the patch that I added as I detected these dependencies, > but I notably did not mention the mwait one, and missed completely the cpu expansion issue. > > It's in kvm-cpu.c: > > static bool kvm_cpu_realizefn(CPUState *cs, Error **errp) > { > /* > * The realize order is important, since x86_cpu_realize() checks if > * nothing else has been set by the user (or by accelerators) in > * cpu->ucode_rev and cpu->phys_bits. > * > * realize order: > * kvm_cpu -> host_cpu -> x86_cpu > */ > > Maybe there is a better place to document this, where we could also describe in more detail the other dependencies? I would describe it in (at least) two places: 1. Documentation of AccelCPUClass.cpu_realizefn() should indicate what is allowed and not allowed for people implementing accelerators. 2. Comments at x86_cpu_realizefn() indicating the dependencies and why ordering is important. > > On my side the main question would be: did I miss some other order dependency? > > Knowing exactly what the current code assumptions are, and where those dependencies lie > would be preferable I think compared with reverting the commits. Absolutely. I was just trying to figure out what's the simplest and most trivial fix possible for the issue. > > Something able to cover this with reliable tests would be a good way to feel more confident with the resolution, > to make sure that something else is not hiding in there.. Yes, this kind of bug should have been caught by automated test cases somehow. > > > > > Note that this is not a comment on this fix, specifically, but on > > how the initialization ordering is easy to break here. > > > > > >> /* For 64bit systems think about the number of physical bits to present. > >> * ideally this should be the same as the host; anything other than matching > >> * the host can cause incorrect guest behaviour. > >> -- > >> 2.26.2 > >> > > > > Thanks, > > Claudio > -- Eduardo