On Fri, Sep 15, 2023 at 04:41:20AM -0300, Leonardo Bras wrote: > Other than that, all I can think of is removing the features from guest: > > As you commented, there may be some features that would not be a problem > to be removed, and also there may be features which are not used by the > workload, and could be removed. But this would depend on the feature, and > the workload, beind a custom solution for every case. Yes, the "fixup back" should be refined to pointed and verified cases. > For this (removing guest features), from kernel side, I would suggest using > SystemTap (and eBPF, IIRC). The procedures should be something like: > - Try to migrate VM from host with older kernel: fail > - Look at qemu error, which features are missing? > - Are those features safely removable from guest ? > - If so, get an SystemTap / eBPF script masking out the undesired bits. > - Try the migration again, it should succeed. > > IIRC, this could also be done in qemu side, with a custom qemu: > - Try to migrate VM from host with older kernel: fail > - Look at qemu error, which features are missing? > - Are those features safely removable from guest ? > - If so, get a custom qemu which mask-out the desired flags before the VM > starts > - Live migrate (can be inside the source host) to the custom qemu > - Live migrate from custom qemu to target host. > - The custom qemu could be on a auxiliary host, and used only for this > > Yes, it's hard, takes time, and may not solve every case, but it gets a > higher chance of the VM surviving in the long run. Thank you for taking the time to throughly consider the issue and suggest some ways out - I really appreciate it. > But keep in mind this is a hack. > Taking features from a live guest is not supported in any way, and has a > high chance of crashing the VM. OK - if there's no interest in the below, I will not push for including this patch in the kernel tree any longer. I do think the specific case below is what a vast majority of KVM users will struggle with in the near future, though: I have a test environment with Broadwell-based (have only AVX-256) guests running under Skylake (PKRU, AVX512, ...) hypervisors. I added some pr_debug statements to a guest kernel running under a hypervisor, with said hypervisor containing neither your nor my patches, and printed the guests view of `fpu_kernel_cfg.max_features` at boot. It was 0x7, or: XFEATURE_MASK_FP, XFEATURE_MASK_SSE, XFEATURE_MASK_YMM Thus, I'm pretty sure that all that's happening here is that the guest's FP context is having PKRU/ZMM. saved and restored needlessly by the hypervisor. Stripping it on a live-migration does not seem to have any ill-effects in all the testing I have done. Cheers, Tyler