On Fri, Nov 12, 2021, Borislav Petkov wrote: > On Fri, Nov 12, 2021 at 09:59:46AM -0800, Dave Hansen wrote: > > Or, is there some mechanism that prevent guest-private memory from being > > accessed in random host kernel code? Or random host userspace code... > So I'm currently under the impression that random host->guest accesses > should not happen if not previously agreed upon by both. Key word "should". > Because, as explained on IRC, if host touches a private guest page, > whatever the host does to that page, the next time the guest runs, it'll > get a #VC where it will see that that page doesn't belong to it anymore > and then, out of paranoia, it will simply terminate to protect itself. > > So cloud providers should have an interest to prevent such random stray > accesses if they wanna have guests. :) Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong. On Fri, Nov 12, 2021, Peter Gonda wrote: > Here is an alternative to the current approach: On RMP violation (host > or userspace) the page fault handler converts the page from private to > shared to allow the write to continue. This pulls from s390’s error > handling which does exactly this. See ‘arch_make_page_accessible()’. Ah, after further reading, s390 does _not_ do implicit private=>shared conversions. s390's arch_make_page_accessible() is somewhat similar, but it is not a direct comparison. IIUC, it exports and integrity protects the data and thus preserves the guest's data in an encrypted form, e.g. so that it can be swapped to disk. And if the host corrupts the data, attempting to convert it back to secure on a subsequent guest access will fail. The host kernel's handling of the "convert to secure" failures doesn't appear to be all that robust, e.g. it looks like there are multiple paths where the error is dropped on the floor and the guest is resumed , but IMO soft hanging the guest is still better than inducing a fault in the guest, and far better than potentially coercing the guest into reading corrupted memory ("spurious" PVALIDATE). And s390's behavior is fixable since it's purely a host error handling problem. To truly make a page shared, s390 requires the guest to call into the ultravisor to make a page shared. And on the host side, the host can pin a page as shared to prevent the guest from unsharing it while the host is accessing it as a shared page. So, inducing #VC is similar in the sense that a malicious s390 can also DoS itself, but is quite different in that (AFAICT) s390 does not create an attack surface where a malicious or buggy host userspace can induce faults in the guest, or worst case in SNP, exploit a buggy guest into accepting and accessing corrupted data. It's also different in that s390 doesn't implicitly convert between shared and private. Functionally, it doesn't really change the end result because a buggy host that writes guest private memory will DoS the guest (by inducing a #VC or corrupting exported data), but at least for s390 there's a sane, legitimate use case for accessing guest private memory (swap and maybe migration?), whereas for SNP, IMO implicitly converting to shared on a host access is straight up wrong. > Additionally it adds less complexity to the SNP kernel patches, and > requires no new ABI. I disagree, this would require "new" ABI in the sense that it commits KVM to supporting SNP without requiring userspace to initiate any and all conversions between shared and private. Which in my mind is the big elephant in the room: do we want to require new KVM (and kernel?) ABI to allow/force userspace to explicitly declare guest private memory for TDX _and_ SNP, or just TDX?