On 2024-08-06 07:10-0700 David Hildenbrand wrote: > > While guest_memfd is not available to be mapped by userspace, it is > > still accessible through the kernel's direct map. This means that in > > scenarios where guest-private memory is not hardware protected, it can > > be speculatively read and its contents potentially leaked through > > hardware side-channels. Removing guest-private memory from the direct > > map, thus mitigates a large class of speculative execution issues > > [1, Table 1]. > > I think you have to point out here that the speculative execution issues > are primarily only an issue when guest_memfd private memory is used > without TDX and friends where the memory would be encrypted either way. > > Or am I wrong? Actually, I'm not sure how much protection CoCo solutions offer in this regard. I'd love to hear more from Intel and AMD on this, but it looks like they are not targeting full coverage for these types of attacks (beyond protecting guest mitigation settings from manipulation by the host). For example, see this selection from AMD's 2020 whitepaper [1] on SEV-SNP: "There are certain classes of attacks that are not in scope for any of these three features. Architectural side channel attacks on CPU data structures are not specifically prevented by any hardware means. As with standard software security practices, code which is sensitive to such side channel attacks (e.g., cryptographic libraries) should be written in a way which helps prevent such attacks." And: "While SEV-SNP offers guests several options when it comes to protection from speculative side channel attacks and SMT, it is not able to protect against all possible side channel attacks. For example, traditional side channel attacks on software such as PRIME+PROBE are not protected by SEV-SNP." Intel's docs also indicate guests need to protect themselves in some cases saying, "TD software should be aware that potentially untrusted software running outside a TD may be able to influence conditional branch predictions of software running in a TD" [2] and "a TDX guest VM is no different from a legacy guest VM in terms of protecting this userspace <-> OS kernel boundary" [3]. But these focus on hardening kernel & software within the guest. What's not clear to me is what happens during transient execution when the host kernel attempts to access a page in physical memory that belongs to a guest. I assume if it only happens transiently, it will not result in a machine check like it would if the instructions were actually retired. As far as I can tell encryption happens between the CPU & main memory, so cache contents will be plaintext. This seems to leave open the possibility of the host kernel retrieving the plaintext cache contents with a transient execution attack. I assume vendors have controls in place to stop this, but Foreshadow/L1TF is a good example of one place this fell apart for SGX [4]. All that said, we're also dependent on hardware not being subject to L1TF-style issues for the currently proposed non-CoCo method to be effective. We're simply clearing the Present bit while the physmap PTE still points to the guest physical page. This was found to be exploitable across OS & VMM boundaries on Intel server parts before Cascade Lake [5] (thanks to Claudio for highlighting this). So that's a long way of saying TDX may offer similar protection, but not because of encryption. Derek [1] https://www.amd.com/content/dam/amd/en/documents/epyc-business-docs/white-papers/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf#page=19 [2] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/trusted-domain-security-guidance-for-developers.html [3] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#transient-execution-attacks-and-their-mitigation [4] https://foreshadowattack.eu/foreshadow.pdf [5] https://foreshadowattack.eu/foreshadow-NG.pdf