On Thu, May 16, 2024 at 07:56:18AM +0800, Edgecombe, Rick P wrote: > On Wed, 2024-05-15 at 15:47 -0700, Sean Christopherson wrote: > > > I didn't gather there was any proof of this. Did you have any hunch either > > > way? > > > > I doubt the guest was able to access memory it shouldn't have been able to > > access. > > But that's a moot point, as the bigger problem is that, because we have no > > idea > > what's at fault, KVM can't make any guarantees about the safety of such a > > flag. > > > > TDX is a special case where we don't have a better option (we do have other > > options, > > they're just horrible). In other words, the choice is essentially to either: > > > > (a) cross our fingers and hope that the problem is limited to shared memory > > with QEMU+VFIO, i.e. and doesn't affect TDX private memory. > > > > or > > > > (b) don't merge TDX until the original regression is fully resolved. > > > > FWIW, I would love to root cause and fix the failure, but I don't know how > > feasible > > that is at this point. Me too. So curious about what's exactly broken. > > If we think it is not a security issue, and we don't even know if it can be hit > for TDX, then I'd be included to go with (a). Especially since we are just > aiming for the most basic support, and don't have to worry about regressions in > the classical sense. > > I'm not sure how easy it will be to root cause it at this point. Hopefully Yan > will be coming online soon. She mentioned some previous Intel effort to > investigate it. Presumably we would have to start with the old kernel that > exhibited the issue. If it can still be found... I tried to reproduce it under the direction from Weijiang, though my NVIDIA card was of a little difference as the one used by Weijiang. However, I failed. I'm not sure whether it was because I did it remotely or whether it was because I didn't spend enough time (since it's not an official tasks assigned to me and I just did it out of curiosity). If you think it's worthwhile, I would like to try again locally to see if I will be lucky enough to reproduce and root-cause it. But is it possible not to have TDX be pending on this bug/regression?