On Wed, Apr 12, 2023 at 06:07:28PM -0700, Sean Christopherson wrote: > On Wed, Jan 25, 2023, Kirill A. Shutemov wrote: > > On Wed, Jan 25, 2023 at 12:20:26AM +0000, Sean Christopherson wrote: > > > On Tue, Jan 24, 2023, Liam Merwick wrote: > > > > On 14/01/2023 00:37, Sean Christopherson wrote: > > > > > On Fri, Dec 02, 2022, Chao Peng wrote: > > > > > > This patch series implements KVM guest private memory for confidential > > > > > > computing scenarios like Intel TDX[1]. If a TDX host accesses > > > > > > TDX-protected guest memory, machine check can happen which can further > > > > > > crash the running host system, this is terrible for multi-tenant > > > > > > configurations. The host accesses include those from KVM userspace like > > > > > > QEMU. This series addresses KVM userspace induced crash by introducing > > > > > > new mm and KVM interfaces so KVM userspace can still manage guest memory > > > > > > via a fd-based approach, but it can never access the guest memory > > > > > > content. > > > > > > > > > > > > The patch series touches both core mm and KVM code. I appreciate > > > > > > Andrew/Hugh and Paolo/Sean can review and pick these patches. Any other > > > > > > reviews are always welcome. > > > > > > - 01: mm change, target for mm tree > > > > > > - 02-09: KVM change, target for KVM tree > > > > > > > > > > A version with all of my feedback, plus reworked versions of Vishal's selftest, > > > > > is available here: > > > > > > > > > > git@xxxxxxxxxx:sean-jc/linux.git x86/upm_base_support > > > > > > > > > > It compiles and passes the selftest, but it's otherwise barely tested. There are > > > > > a few todos (2 I think?) and many of the commits need changelogs, i.e. it's still > > > > > a WIP. > > > > > > > > > > > > > When running LTP (https://github.com/linux-test-project/ltp) on the v10 > > > > bits (and also with Sean's branch above) I encounter the following NULL > > > > pointer dereference with testcases/kernel/syscalls/madvise/madvise01 > > > > (100% reproducible). > > > > > > > > It appears that in restrictedmem_error_page() > > > > inode->i_mapping->private_data is NULL in the > > > > list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) but I > > > > don't know why. > > > > > > Kirill, can you take a look? Or pass the buck to someone who can? :-) > > > > The patch below should help. > > > > diff --git a/mm/restrictedmem.c b/mm/restrictedmem.c > > index 15c52301eeb9..39ada985c7c0 100644 > > --- a/mm/restrictedmem.c > > +++ b/mm/restrictedmem.c > > @@ -307,14 +307,29 @@ void restrictedmem_error_page(struct page *page, struct address_space *mapping) > > > > spin_lock(&sb->s_inode_list_lock); > > list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) { > > - struct restrictedmem *rm = inode->i_mapping->private_data; > > struct restrictedmem_notifier *notifier; > > - struct file *memfd = rm->memfd; > > + struct restrictedmem *rm; > > unsigned long index; > > + struct file *memfd; > > > > - if (memfd->f_mapping != mapping) > > + if (atomic_read(&inode->i_count)) > > Kirill, should this be > > if (!atomic_read(&inode->i_count)) > continue; > > i.e. skip unreferenced inodes, not skip referenced inodes? Ouch. Yes. But looking at other instances of s_inodes usage, I think we can drop the check altogether. inode cannot be completely free until it is removed from s_inodes list. While there, replace list_for_each_entry_safe() with list_for_each_entry() as we don't remove anything from the list. diff --git a/mm/restrictedmem.c b/mm/restrictedmem.c index 55e99e6c09a1..8e8a4420d3d1 100644 --- a/mm/restrictedmem.c +++ b/mm/restrictedmem.c @@ -194,22 +194,19 @@ static int restricted_error_remove_page(struct address_space *mapping, struct page *page) { struct super_block *sb = restrictedmem_mnt->mnt_sb; - struct inode *inode, *next; + struct inode *inode; pgoff_t start, end; start = page->index; end = start + thp_nr_pages(page); spin_lock(&sb->s_inode_list_lock); - list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) { + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { struct restrictedmem_notifier *notifier; struct restrictedmem *rm; unsigned long index; struct file *memfd; - if (atomic_read(&inode->i_count)) - continue; - spin_lock(&inode->i_lock); if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) { spin_unlock(&inode->i_lock); -- Kiryl Shutsemau / Kirill A. Shutemov