> -----Original Message----- > From: Cornelia Huck <cohuck@xxxxxxxxxx> > Sent: Thursday, May 9, 2019 4:18 AM > To: Parav Pandit <parav@xxxxxxxxxxxx> > Cc: kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > kwankhede@xxxxxxxxxx; alex.williamson@xxxxxxxxxx; cjia@xxxxxxxxxx > Subject: Re: [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove file on > stale device removal > > On Wed, 8 May 2019 22:13:28 +0000 > Parav Pandit <parav@xxxxxxxxxxxx> wrote: > > > > -----Original Message----- > > > From: Cornelia Huck <cohuck@xxxxxxxxxx> > > > Sent: Wednesday, May 8, 2019 12:17 PM > > > To: Parav Pandit <parav@xxxxxxxxxxxx> > > > Cc: kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > > > kwankhede@xxxxxxxxxx; alex.williamson@xxxxxxxxxx; cjia@xxxxxxxxxx > > > Subject: Re: [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove > > > file on stale device removal > > > > > > On Tue, 30 Apr 2019 17:49:36 -0500 > > > Parav Pandit <parav@xxxxxxxxxxxx> wrote: > > > > > > > If device is removal is initiated by two threads as below, mdev > > > > core attempts to create a syfs remove file on stale device. > > > > During this flow, below [1] call trace is observed. > > > > > > > > cpu-0 cpu-1 > > > > ----- ----- > > > > mdev_unregister_device() > > > > device_for_each_child > > > > mdev_device_remove_cb > > > > mdev_device_remove > > > > user_syscall > > > > remove_store() > > > > mdev_device_remove() > > > > [..] > > > > unregister device(); > > > > /* not found in list or > > > > * active=false. > > > > */ > > > > sysfs_create_file() > > > > ..Call trace > > > > > > > > Now that mdev core follows correct device removal system of the > > > > linux bus model, remove shouldn't fail in normal cases. If it > > > > fails, there is no point of creating a stale file or checking for specific > error status. > > > > > > Which error cases are left? Is there anything that does not indicate > > > that something got terribly messed up internally? > > > > > Few reasons I can think of that can fail remove are: > > > > 1. Some device removal requires allocating memory too as it needs to issue > commands to device. > > If on the path, such allocation fails, remove can fail. However such fail to > allocate memory will probably result into more serious warnings before this. > > Nod. If we're OOM, we probably have some bigger problems anyway. > > > 2. if the device firmware has crashed, device removal commands will likely > timeout and return such error upto user. > > In that case, I'd consider the device pretty much unusable in any case. > Right. > > 3. If user tries to remove a device, while parent is already in removal path, > this call will eventually fail as it won't find the device in the internal list. > > This should be benign, I think. > Right. > > > > > > > > > > kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327 > > > > sysfs_create_file_ns+0x7f/0x90 > > > > kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted > > > > 5.1.0-rc6-vdevbus+ #6 > > > > kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS > > > > 2.0b > > > > 08/09/2016 > > > > kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90 > > > > kernel: Call Trace: > > > > kernel: remove_store+0xdc/0x100 [mdev] > > > > kernel: kernfs_fop_write+0x113/0x1a0 > > > > kernel: vfs_write+0xad/0x1b0 > > > > kernel: ksys_write+0x5a/0xe0 > > > > kernel: do_syscall_64+0x5a/0x210 > > > > kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > > > > > Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx> > > > > --- > > > > drivers/vfio/mdev/mdev_sysfs.c | 4 +--- > > > > 1 file changed, 1 insertion(+), 3 deletions(-) > > > > > > > > diff --git a/drivers/vfio/mdev/mdev_sysfs.c > > > > b/drivers/vfio/mdev/mdev_sysfs.c index 9f774b91d275..ffa3dcebf201 > > > > 100644 > > > > --- a/drivers/vfio/mdev/mdev_sysfs.c > > > > +++ b/drivers/vfio/mdev/mdev_sysfs.c > > > > @@ -237,10 +237,8 @@ static ssize_t remove_store(struct device > > > > *dev, > > > struct device_attribute *attr, > > > > int ret; > > > > > > > > ret = mdev_device_remove(dev); > > > > - if (ret) { > > > > - device_create_file(dev, attr); > > > > + if (ret) > > > > > > Should you merge this into the previous patch? > > > > > I am not sure. Previous patch changes the sequence. I think that deserved > an own patch by itself. > > This change is making use of that sequence. > > So its easier to review? Alex had comment in v0 to split into more logical > patches, so... > > Specially to capture a different call trace, I cut into different patch. > > Otherwise previous patch's commit message is too long. > > I'm not sure if splitting out this one is worth it... your call. > Ok. either ways... > > > > > > return ret; > > > > - } > > > > } > > > > > > > > return count; > >