Re: [PATCH v10 1/9] mm: Introduce memfd_restricted system call to create restricted user memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 16, 2023 at 03:21:21PM +0530, Nikunj A. Dadhania wrote:
> 
> > +static struct file *restrictedmem_file_create(struct file *memfd)
> > +{
> > +	struct restrictedmem_data *data;
> > +	struct address_space *mapping;
> > +	struct inode *inode;
> > +	struct file *file;
> > +
> > +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> > +	if (!data)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	data->memfd = memfd;
> > +	mutex_init(&data->lock);
> > +	INIT_LIST_HEAD(&data->notifiers);
> > +
> > +	inode = alloc_anon_inode(restrictedmem_mnt->mnt_sb);
> > +	if (IS_ERR(inode)) {
> > +		kfree(data);
> > +		return ERR_CAST(inode);
> > +	}
> 
> alloc_anon_inode() uses new_pseudo_inode() to get the inode. As per the comment, new inode 
> is not added to the superblock s_inodes list.

Another issue somewhat related to alloc_anon_inode() is that the shmem code
in some cases assumes the inode struct was allocated via shmem_alloc_inode(),
which allocates a struct shmem_inode_info, which is a superset of struct inode
with additional fields for things like spinlocks.

These additional fields don't get allocated/ininitialized in the case of
restrictedmem, so when restrictedmem_getattr() tries to pass the inode on to
shmem handler, it can cause a crash.

For instance, the following trace was seen when executing 'sudo lsof' while a
process/guest was running with an open memfd FD:

    [24393.121409] general protection fault, probably for non-canonical address 0xfe9fb182fea3f077: 0000 [#1] PREEMPT SMP NOPTI
    [24393.133546] CPU: 2 PID: 590073 Comm: lsof Tainted: G            E      6.1.0-rc4-upm10b-host-snp-v8b+ #4
    [24393.144125] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS RXM1009B 05/14/2022
    [24393.153150] RIP: 0010:native_queued_spin_lock_slowpath+0x3a3/0x3e0
    [24393.160049] Code: f3 90 41 8b 04 24 85 c0 74 ea eb f4 c1 ea 12 83 e0 03 83 ea 01 48 c1 e0 05 48 63 d2 48 05 00 41 04 00 48 03 04 d5 e0 ea 8b 82 <48> 89 18 8b 43 08 85 c0 75 09 f3 90 8b 43 08 85 c0 74 f7 48 8b 13
    [24393.181004] RSP: 0018:ffffc9006b6a3cf8 EFLAGS: 00010086
    [24393.186832] RAX: fe9fb182fea3f077 RBX: ffff889fcc144100 RCX: 0000000000000000
    [24393.194793] RDX: 0000000000003ffe RSI: ffffffff827acde9 RDI: ffffc9006b6a3cdf
    [24393.202751] RBP: ffffc9006b6a3d20 R08: 0000000000000001 R09: 0000000000000000
    [24393.210710] R10: 0000000000000000 R11: 000000000000ffff R12: ffff888179fa50e0
    [24393.218670] R13: ffff889fcc144100 R14: 00000000000c0000 R15: 00000000000c0000
    [24393.226629] FS:  00007f9440f45400(0000) GS:ffff889fcc100000(0000) knlGS:0000000000000000
    [24393.235692] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [24393.242101] CR2: 000055c55a9cf088 CR3: 0008000220e9c003 CR4: 0000000000770ee0
    [24393.250059] PKRU: 55555554
    [24393.253073] Call Trace:
    [24393.255797]  <TASK>
    [24393.258133]  do_raw_spin_lock+0xc4/0xd0
    [24393.262410]  _raw_spin_lock_irq+0x50/0x70
    [24393.266880]  ? shmem_getattr+0x4c/0xf0
    [24393.271060]  shmem_getattr+0x4c/0xf0
    [24393.275044]  restrictedmem_getattr+0x34/0x40
    [24393.279805]  vfs_getattr_nosec+0xbd/0xe0
    [24393.284178]  vfs_getattr+0x37/0x50
    [24393.287971]  vfs_statx+0xa0/0x150
    [24393.291668]  vfs_fstatat+0x59/0x80
    [24393.295462]  __do_sys_newstat+0x35/0x70
    [24393.299739]  __x64_sys_newstat+0x16/0x20
    [24393.304111]  do_syscall_64+0x3b/0x90
    [24393.308098]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

As a workaround we've been doing the following, but it's probably not the
proper fix:

  https://github.com/AMDESE/linux/commit/0378116b5c4e373295c9101727f2cb5112d6b1f4

-Mike




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux