* Sean Christopherson (seanjc@xxxxxxxxxx) wrote: > On Fri, Nov 12, 2021, Borislav Petkov wrote: > > On Fri, Nov 12, 2021 at 09:59:46AM -0800, Dave Hansen wrote: > > > Or, is there some mechanism that prevent guest-private memory from being > > > accessed in random host kernel code? > > Or random host userspace code... > > > So I'm currently under the impression that random host->guest accesses > > should not happen if not previously agreed upon by both. > > Key word "should". > > > Because, as explained on IRC, if host touches a private guest page, > > whatever the host does to that page, the next time the guest runs, it'll > > get a #VC where it will see that that page doesn't belong to it anymore > > and then, out of paranoia, it will simply terminate to protect itself. > > > > So cloud providers should have an interest to prevent such random stray > > accesses if they wanna have guests. :) > > Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong. Would it necessarily have been a host bug? A guest telling the host a bad GPA to DMA into would trigger this wouldn't it? Still; I wonder if it's best to kill the guest - maybe it's best for the host to kill the guest and leave behind diagnostics of what happened; for someone debugging the crash, it's going to be less useful to know that page X was wrongly accessed (which is what the guest would see), and more useful to know that it was the kernel's vhost-... driver that accessed it. Dave > On Fri, Nov 12, 2021, Peter Gonda wrote: > > Here is an alternative to the current approach: On RMP violation (host > > or userspace) the page fault handler converts the page from private to > > shared to allow the write to continue. This pulls from s390’s error > > handling which does exactly this. See ‘arch_make_page_accessible()’. > > Ah, after further reading, s390 does _not_ do implicit private=>shared conversions. > > s390's arch_make_page_accessible() is somewhat similar, but it is not a direct > comparison. IIUC, it exports and integrity protects the data and thus preserves > the guest's data in an encrypted form, e.g. so that it can be swapped to disk. > And if the host corrupts the data, attempting to convert it back to secure on a > subsequent guest access will fail. > > The host kernel's handling of the "convert to secure" failures doesn't appear to > be all that robust, e.g. it looks like there are multiple paths where the error > is dropped on the floor and the guest is resumed , but IMO soft hanging the guest > is still better than inducing a fault in the guest, and far better than potentially > coercing the guest into reading corrupted memory ("spurious" PVALIDATE). And s390's > behavior is fixable since it's purely a host error handling problem. > > To truly make a page shared, s390 requires the guest to call into the ultravisor > to make a page shared. And on the host side, the host can pin a page as shared > to prevent the guest from unsharing it while the host is accessing it as a shared > page. > > So, inducing #VC is similar in the sense that a malicious s390 can also DoS itself, > but is quite different in that (AFAICT) s390 does not create an attack surface where > a malicious or buggy host userspace can induce faults in the guest, or worst case in > SNP, exploit a buggy guest into accepting and accessing corrupted data. > > It's also different in that s390 doesn't implicitly convert between shared and > private. Functionally, it doesn't really change the end result because a buggy > host that writes guest private memory will DoS the guest (by inducing a #VC or > corrupting exported data), but at least for s390 there's a sane, legitimate use > case for accessing guest private memory (swap and maybe migration?), whereas for > SNP, IMO implicitly converting to shared on a host access is straight up wrong. > > > Additionally it adds less complexity to the SNP kernel patches, and > > requires no new ABI. > > I disagree, this would require "new" ABI in the sense that it commits KVM to > supporting SNP without requiring userspace to initiate any and all conversions > between shared and private. Which in my mind is the big elephant in the room: > do we want to require new KVM (and kernel?) ABI to allow/force userspace to > explicitly declare guest private memory for TDX _and_ SNP, or just TDX? > -- Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK