Alexander Mikhalitsyn <alexander.mikhalitsyn@xxxxxxxxxxxxx> writes: > Dear friends, > > Recently, we (in OpenVZ) noticed an interesting issue with > L2 VM hang on RHEL 7 based hosts with SVM (AMD). > > Let me describe our test configuration: > - AMD EPYC 7443P (Milan) or AMD EPYC 7261 (Rome) > - RHEL 7 based kernel on the Host Node. > ... and most important: > > L0 -----------> L1 --------> L2 > RHEL 7 -> RHEL 7 --------> RHEL 7 *works* > RHEL 7 -> RHEL 7 --------> RHEL 8 *works* > RHEL 7 -> RHEL 7 --------> recent Fedora *works* > RHEL 7 -> RHEL 8 --------> RHEL 7 *L2 hang* > RHEL 7 -> fresh Fedora --> RHEL 7 *L2 hang* > > or even more: > RHEL 7 -> RHEL 7 --------> *any tested Linux guest* *works* > RHEL 7 -> RHEL 8 --------> *any tested Linux guest* *L2 hang* > > but at the same time: > RHEL 8 -> RHEL 8 --------> *any tested Linux guest* *works* > > It was the key observation and I've started bisecting L1 kernel to find > some hint. It was commit: > c9d40913 ("KVM: x86: enable event window in inject_pending_event") > > At the same minute I've tried to revert it for CentOS 8 kernel and retry test, > and it... works! To conclude, if we have an *old* kernel on host and *sufficiently new* kernel > in L1 then L2 totaly broken (only for SVM). > > I've tried to port this patch for L0 kernel and check if it will fix the issue. And yes, > it works. I wonder if it will be useful information for KVM developers and users. > > My attempt to port it for RHEL 7 kernel: > https://lists.openvz.org/pipermail/devel/2022-June/079776.html Thanks for the investigation! FWIW, nesting was never supported in RHEL7. It was disabled by default and only worked to certain extent on Intel. By the time we stopped rebasing KVM in RHEL7, nested SVM was still a trainwreck, even upstream. > > Possibly I need to port this patches for stable kernels too and send it? > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v4.9.320&qt=grep&q=enable+event+window+in+inject_pending_event > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v4.14.285&qt=grep&q=enable+event+window+in+inject_pending_event > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v4.19.249&qt=grep&q=enable+event+window+in+inject_pending_event > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v5.4.201&qt=grep&q=enable+event+window+in+inject_pending_event > > So, 4.9, 4.14, 4.19 and 5.4 kernels lacks this patch. Personally, I wouldn't bother with anything below 5.4, nSVM is in very poor shape there, fixing one problem will just create an illusion that it is 'supported'. > > I've not checked that yet but it looks like, for instance, > > L0 -> L1 -> L2 > 5.4 -> 5.10 -> *any kernel version* > > setup will hang for SVM. Cc: Max who fixed a long list of issues on nSVM. -- Vitaly