On Thu, Feb 16, 2023 at 03:21:21PM +0530, Nikunj A. Dadhania wrote: > > > +static struct file *restrictedmem_file_create(struct file *memfd) > > +{ > > + struct restrictedmem_data *data; > > + struct address_space *mapping; > > + struct inode *inode; > > + struct file *file; > > + > > + data = kzalloc(sizeof(*data), GFP_KERNEL); > > + if (!data) > > + return ERR_PTR(-ENOMEM); > > + > > + data->memfd = memfd; > > + mutex_init(&data->lock); > > + INIT_LIST_HEAD(&data->notifiers); > > + > > + inode = alloc_anon_inode(restrictedmem_mnt->mnt_sb); > > + if (IS_ERR(inode)) { > > + kfree(data); > > + return ERR_CAST(inode); > > + } > > alloc_anon_inode() uses new_pseudo_inode() to get the inode. As per the comment, new inode > is not added to the superblock s_inodes list. Another issue somewhat related to alloc_anon_inode() is that the shmem code in some cases assumes the inode struct was allocated via shmem_alloc_inode(), which allocates a struct shmem_inode_info, which is a superset of struct inode with additional fields for things like spinlocks. These additional fields don't get allocated/ininitialized in the case of restrictedmem, so when restrictedmem_getattr() tries to pass the inode on to shmem handler, it can cause a crash. For instance, the following trace was seen when executing 'sudo lsof' while a process/guest was running with an open memfd FD: [24393.121409] general protection fault, probably for non-canonical address 0xfe9fb182fea3f077: 0000 [#1] PREEMPT SMP NOPTI [24393.133546] CPU: 2 PID: 590073 Comm: lsof Tainted: G E 6.1.0-rc4-upm10b-host-snp-v8b+ #4 [24393.144125] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS RXM1009B 05/14/2022 [24393.153150] RIP: 0010:native_queued_spin_lock_slowpath+0x3a3/0x3e0 [24393.160049] Code: f3 90 41 8b 04 24 85 c0 74 ea eb f4 c1 ea 12 83 e0 03 83 ea 01 48 c1 e0 05 48 63 d2 48 05 00 41 04 00 48 03 04 d5 e0 ea 8b 82 <48> 89 18 8b 43 08 85 c0 75 09 f3 90 8b 43 08 85 c0 74 f7 48 8b 13 [24393.181004] RSP: 0018:ffffc9006b6a3cf8 EFLAGS: 00010086 [24393.186832] RAX: fe9fb182fea3f077 RBX: ffff889fcc144100 RCX: 0000000000000000 [24393.194793] RDX: 0000000000003ffe RSI: ffffffff827acde9 RDI: ffffc9006b6a3cdf [24393.202751] RBP: ffffc9006b6a3d20 R08: 0000000000000001 R09: 0000000000000000 [24393.210710] R10: 0000000000000000 R11: 000000000000ffff R12: ffff888179fa50e0 [24393.218670] R13: ffff889fcc144100 R14: 00000000000c0000 R15: 00000000000c0000 [24393.226629] FS: 00007f9440f45400(0000) GS:ffff889fcc100000(0000) knlGS:0000000000000000 [24393.235692] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [24393.242101] CR2: 000055c55a9cf088 CR3: 0008000220e9c003 CR4: 0000000000770ee0 [24393.250059] PKRU: 55555554 [24393.253073] Call Trace: [24393.255797] <TASK> [24393.258133] do_raw_spin_lock+0xc4/0xd0 [24393.262410] _raw_spin_lock_irq+0x50/0x70 [24393.266880] ? shmem_getattr+0x4c/0xf0 [24393.271060] shmem_getattr+0x4c/0xf0 [24393.275044] restrictedmem_getattr+0x34/0x40 [24393.279805] vfs_getattr_nosec+0xbd/0xe0 [24393.284178] vfs_getattr+0x37/0x50 [24393.287971] vfs_statx+0xa0/0x150 [24393.291668] vfs_fstatat+0x59/0x80 [24393.295462] __do_sys_newstat+0x35/0x70 [24393.299739] __x64_sys_newstat+0x16/0x20 [24393.304111] do_syscall_64+0x3b/0x90 [24393.308098] entry_SYSCALL_64_after_hwframe+0x63/0xcd As a workaround we've been doing the following, but it's probably not the proper fix: https://github.com/AMDESE/linux/commit/0378116b5c4e373295c9101727f2cb5112d6b1f4 -Mike