On Thu, Sep 14, 2023 at 10:05:57AM -0700, Dongli Zhang wrote: > > > On 9/14/23 2:11 AM, Tyler Stachecki wrote: > > On Thu, Sep 14, 2023 at 04:15:54AM -0300, Leonardo Bras wrote: > >> So, IIUC, the xfeatures from the source guest will be different than the > >> xfeatures of the target (destination) guest. Is that correct? > > > > Correct. > > > >> It does not seem right to me. I mean, from the guest viewpoint, some > >> features will simply vanish during execution, and this could lead to major > >> issues in the guest. > > I fully agree with this. > > I think the original commit ad856280ddea ("x86/kvm/fpu: Limit guest > user_xfeatures to supported bits of XCR0") is for the source server, not > destination server. > > That is: > > 1. Without the commit (src and dst), something bad may happen. > > 2. With the commit on src, issue is fixed. > > 3. With the commit only dst, it is expected that issue is not fixed. > > Therefore, from administrator's perspective, the bugfix should always be applied > no the source server, in order to succeed the migration. > > > BTW, we may not be able to use commit ad856280ddea in the Fixes tag. > > > > > My assumption is that the guest CPU model should confine access to registers > > that make sense for that (guest) CPU. > > > > e.g., take a host CPU capable of AVX-512 running a guest CPU model that only > > has AVX-256. If the guest suddenly loses the top 256 bits of %zmm*, it should > > not really be perceivable as %ymm architecturally remains unchanged. > > > > Though maybe I'm being too rash here? Is there a case where this assumption > > breaks down? > > > >> The idea here is that if the target (destination) host can't provide those > >> features for the guest, then migration should fail. > >> > >> I mean, qemu should fail the migration, and that's correct behavior. > >> Is it what is happening? > > > > Unfortunately, no, it is not... and that is biggest concern right now. > > > > I do see some discussion between Peter and you on this topic and see that > > there was an RFC to implement such behavior stemming from it, here: > > https://lore.kernel.org/qemu-devel/20220607230645.53950-1-peterx@xxxxxxxxxx/ > > I agree that bug is at QEMU side, not KVM side. > > It is better to improve at QEMU side. I agree fixing QEMU is the best solution we have. > > 4508 int kvm_arch_put_registers(CPUState *cpu, int level) > 4509 { > 4510 X86CPU *x86_cpu = X86_CPU(cpu); > 4511 int ret; > ... ... > 4546 ret = kvm_put_xsave(x86_cpu); > 4547 if (ret < 0) { > 4548 return ret; > 4549 } > ... ...--> the rest of kvm_arch_put_registers() won't execute !!! > > Thank you very much! > > Dongli Zhang > > > > > ... though I do not believe that work ever landed in the tree. Looking at > > qemu's master branch now, the error from kvm_arch_put_registers is just > > discarded in do_kvm_cpu_synchronize_post_init... > > > > ``` > > static void do_kvm_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg) > > { > > kvm_arch_put_registers(cpu, KVM_PUT_FULL_STATE); > > cpu->vcpu_dirty = false; > > } > > ``` > > > > Best, > > Tyler >