On Wed, May 26, 2021 at 07:46:52PM +0000, Sean Christopherson wrote: > On Fri, May 21, 2021, Kirill A. Shutemov wrote: > > Hi Sean, > > > > The core patch of the approach we've discussed before is below. It > > introduce a new page type with the required semantics. > > > > The full patchset can be found here: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git kvm-unmapped-guest-only > > > > but only the patch below is relevant for TDX. QEMU patch is attached. > > Can you post the whole series? I hoped to get it posted as part of TDX host enabling. As it is the feature is incomplete for pure KVM. I didn't implement on KVM side checks that provided by TDX module/hardware, so nothing prevents the same page to be added to multiple KVM instances. > The KVM behavior and usage of FOLL_GUEST is very relevant to TDX. The patch can be found here: https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=kvm-unmapped-guest-only&id=2cd6c2c20528696a46a2a59383ca81638bf856b5 > > CONFIG_HAVE_KVM_PROTECTED_MEMORY has to be changed to what is appropriate > > for TDX and FOLL_GUEST has to be used in hva_to_pfn_slow() when running > > TDX guest. > > This behavior in particular is relevant; KVM should provide FOLL_GUEST iff the > access is private or the VM type doesn't differentiate between private and > shared. I added FOL_GUEST if the KVM instance has the feature enabled. On top of that TDX-specific code has to check that the page is in fact PageGuest() before inserting it into private SEPT. The scheme makes sure that user-accessible memory cannot be not added as private to TD. > > When page get inserted into private sept we must make sure it is > > PageGuest() or SIGBUS otherwise. > > More KVM feedback :-) > > Ideally, KVM will synchronously exit to userspace with detailed information on > the bad behavior, not do SIGBUS. Hopefully that infrastructure will be in place > sooner than later. > > https://lkml.kernel.org/r/YKxJLcg/WomPE422@xxxxxxxxxx My experiments are still v5.11, but I can rebase to whatever needed once the infrastructure hits upstream. > > Inserting PageGuest() into shared is fine, but the page will not be accessible > > from userspace. > > Even if it can be functionally fine, I don't think we want to allow KVM to map > PageGuest() as shared memory. The only reason to map memory shared is to share > it with something, e.g. the host, that doesn't have access to private memory, so > I can't envision a use case. > > On the KVM side, it's trivially easy to omit FOLL_GUEST for shared memory, while > always passing FOLL_GUEST would require manually zapping. Manual zapping isn't > a big deal, but I do think it can be avoided if userspace must either remap the > hva or define a new KVM memslot (new gpa->hva), both of which will automatically > zap any existing translations. > > Aha, thought of a concrete problem. If KVM maps PageGuest() into shared memory, > then KVM must ensure that the page is not mapped private via a different hva/gpa, > and is not mapped _any_ other guest because the TDX-Module's 1:1 PFN:TD+GPA > enforcement only applies to private memory. The explicit "VM_WRITE | VM_SHARED" > requirement below makes me think this wouldn't be prevented. Hm. I didn't realize that TDX module doesn't prevent the same page to be used as shared and private at the same time. Omitting FOLL_GUEST for shared memory doesn't look like a right approach. IIUC, it would require the kernel to track what memory is share and what private, which defeat the purpose of the rework. I would rather enforce !PageGuest() when share SEPT is populated in addition to enforcing PageGuest() fro private SEPT. Do you see any problems with this? > Oh, and the other nicety is that I think it would avoid having to explicitly > handle PageGuest() memory that is being accessed from kernel/KVM, i.e. if all > memory exposed to KVM must be !PageGuest(), then it is also eligible for > copy_{to,from}_user(). copy_{to,from}_user() enforce by setting PTE entries to PROT_NONE. Or do I miss your point? > > > Any feedback is welcome. > > > > -------------------------------8<------------------------------------------- > > > > From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> > > Date: Fri, 16 Apr 2021 01:30:48 +0300 > > Subject: [PATCH] mm: Introduce guest-only pages > > > > PageGuest() pages are only allowed to be used as guest memory. Userspace > > is not allowed read from or write to such pages. > > > > On page fault, PageGuest() pages produce PROT_NONE page table entries. > > Read or write there will trigger SIGBUS. Access to such pages via > > syscall leads to -EIO. > > > > The new mprotect(2) flag PROT_GUEST translates to VM_GUEST. Any page > > fault to VM_GUEST VMA produces PageGuest() page. > > > > Only shared tmpfs/shmem mappings are supported. > > Is limiting this to tmpfs/shmem only for the PoC/RFC, or is it also expected to > be the long-term behavior? I expect it to be enough to cover all relevant cases, no? Note that MAP_ANONYMOUS|MAP_SHARED also fits here. -- Kirill A. Shutemov