Re: [RFC v1 3/3] vfio-ccw: Release any channel program when releasing/removing vfio-ccw mdev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 3 Apr 2019 11:17:00 -0400
Farhan Ali <alifm@xxxxxxxxxxxxx> wrote:

> On 04/03/2019 04:15 AM, Cornelia Huck wrote:
> > On Tue,  2 Apr 2019 12:44:35 -0400
> > Farhan Ali <alifm@xxxxxxxxxxxxx> wrote:
> >   
> >> When releasing the vfio-ccw mdev, we currently do not release
> >> any existing channel program and it's pinned pages. This can
> >> lead to the following warning:
> >>
> >> [1038876.561565] WARNING: CPU: 2 PID: 144727 at drivers/vfio/vfio_iommu_type1.c:1494 vfio_sanity_check_pfn_list+0x40/0x70 [vfio_iommu_type1]
> >>
> >> ....
> >>
> >> 1038876.561921] Call Trace:
> >> [1038876.561935] ([<00000009897fb870>] 0x9897fb870)
> >> [1038876.561949]  [<000003ff8013bf62>] vfio_iommu_type1_detach_group+0xda/0x2f0 [vfio_iommu_type1]
> >> [1038876.561965]  [<000003ff8007b634>] __vfio_group_unset_container+0x64/0x190 [vfio]
> >> [1038876.561978]  [<000003ff8007b87e>] vfio_group_put_external_user+0x26/0x38 [vfio]
> >> [1038876.562024]  [<000003ff806fc608>] kvm_vfio_group_put_external_user+0x40/0x60 [kvm]
> >> [1038876.562045]  [<000003ff806fcb9e>] kvm_vfio_destroy+0x5e/0xd0 [kvm]
> >> [1038876.562065]  [<000003ff806f63fc>] kvm_put_kvm+0x2a4/0x3d0 [kvm]
> >> [1038876.562083]  [<000003ff806f655e>] kvm_vm_release+0x36/0x48 [kvm]
> >> [1038876.562098]  [<00000000003c2dc4>] __fput+0x144/0x228
> >> [1038876.562113]  [<000000000016ee82>] task_work_run+0x8a/0xd8
> >> [1038876.562125]  [<000000000014c7a8>] do_exit+0x5d8/0xd90
> >> [1038876.562140]  [<000000000014d084>] do_group_exit+0xc4/0xc8
> >> [1038876.562155]  [<000000000015c046>] get_signal+0x9ae/0xa68
> >> [1038876.562169]  [<0000000000108d66>] do_signal+0x66/0x768
> >> [1038876.562185]  [<0000000000b9e37e>] system_call+0x1ea/0x2d8
> >> [1038876.562195] 2 locks held by qemu-system-s39/144727:
> >> [1038876.562205]  #0: 00000000537abaf9 (&container->group_lock){++++}, at: __vfio_group_unset_container+0x3c/0x190 [vfio]
> >> [1038876.562230]  #1: 00000000670008b5 (&iommu->lock){+.+.}, at: vfio_iommu_type1_detach_group+0x36/0x2f0 [vfio_iommu_type1]
> >> [1038876.562250] Last Breaking-Event-Address:
> >> [1038876.562262]  [<000003ff8013aa24>] vfio_sanity_check_pfn_list+0x3c/0x70 [vfio_iommu_type1]
> >> [1038876.562272] irq event stamp: 4236481
> >> [1038876.562287] hardirqs last  enabled at (4236489): [<00000000001cee7a>] console_unlock+0x6d2/0x740
> >> [1038876.562299] hardirqs last disabled at (4236496): [<00000000001ce87e>] console_unlock+0xd6/0x740
> >> [1038876.562311] softirqs last  enabled at (4234162): [<0000000000b9fa1e>] __do_softirq+0x556/0x598
> >> [1038876.562325] softirqs last disabled at (4234153): [<000000000014e4cc>] irq_exit+0xac/0x108
> >> [1038876.562337] ---[ end trace 6c96d467b1c3ca06 ]---
> >>
> >> Similarly we do not free the channel program when we are removing
> >> the vfio-ccw device. Let's fix this by resetting the device and freeing
> >> the channel program and pinned pages in both the release and remove
> >> path.
> >>
> >> Signed-off-by: Farhan Ali <alifm@xxxxxxxxxxxxx>
> >> ---
> >>   drivers/s390/cio/vfio_ccw_ops.c | 9 +++++++++
> >>   1 file changed, 9 insertions(+)
> >>
> >> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> >> index ec2f796c..763c350 100644
> >> --- a/drivers/s390/cio/vfio_ccw_ops.c
> >> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> >> @@ -138,6 +138,7 @@ static int vfio_ccw_mdev_remove(struct mdev_device *mdev)
> >>   		/* The state will be NOT_OPER on error. */
> >>   	}
> >>   
> >> +	cp_free(&private->cp);
> >>   	private->mdev = NULL;
> >>   	atomic_inc(&private->avail);
> >>   
> >> @@ -171,6 +172,14 @@ static void vfio_ccw_mdev_release(struct mdev_device *mdev)
> >>   		dev_get_drvdata(mdev_parent_dev(mdev));
> >>   	int i;
> >>   
> >> +	if ((private->state != VFIO_CCW_STATE_NOT_OPER) &&
> >> +	    (private->state != VFIO_CCW_STATE_STANDBY)) {
> >> +		if (!vfio_ccw_mdev_reset(mdev))
> >> +			private->state = VFIO_CCW_STATE_STANDBY;
> >> +		/* The state will be NOT_OPER on error. */
> >> +	}  
> > 
> > Ok, now I have gotten lost in that maze of unregister, release, remove,
> > and whatever functions. Does it even make sense here to do the
> > disable/enable reset, or is disabling the subchannel the more sensible
> > approach? Isn't the device going away anyway?  
> 
> 
> If it helps, we enter the release path when we either do a device hot 
> unplug, or the guest dies, shuts down or I believe closes the mediated 
> device file descriptor. For example a stacktrace for release would be 
> something like this:
> 
> [  389.970906]  [<000003ff8033583e>] vfio_ccw_mdev_release+0x36/0xd0 
> [vfio_ccw]
> [  389.970910]  [<000003ff8031e1d8>] vfio_mdev_release+0x38/0x58 
> [vfio_mdev]
> [  389.970915]  [<000003ff80074832>] vfio_device_fops_release+0x3a/0x60 
> [vfio]
> [  389.970919]  [<00000000003c2dc4>] __fput+0x144/0x228
> [  389.970924]  [<000000000016ee82>] task_work_run+0x8a/0xd8
> [  389.970928]  [<00000000001094ae>] do_notify_resume+0x46/0x88
> [  389.970932]  [<0000000000b9e276>] system_call+0xe2/0x2d8
> 
> 
> OTOH remove is called when we write to remove file for the mediated 
> device and an example stacktrace would look like this:
> 
> [  285.190391]  [<000003ff802458b0>] vfio_ccw_mdev_remove+0x40/0x78 
> [vfio_ccw]
> [  285.190397]  [<000003ff801a499c>] mdev_device_remove_ops+0x3c/0x80 
> [mdev]
> [  285.190402]  [<000003ff801a4d5c>] mdev_device_remove+0xc4/0x130 [mdev]
> [  285.190407]  [<000003ff801a5074>] remove_store+0x6c/0xa8 [mdev]
> [  285.190412]  [<000000000046f494>] kernfs_fop_write+0x14c/0x1f8
> [  285.190417]  [<00000000003c1530>] __vfs_write+0x38/0x1a8
> [  285.190421]  [<00000000003c187c>] vfs_write+0xb4/0x198
> [  285.190425]  [<00000000003c1af2>] ksys_write+0x5a/0xb0
> [  285.190430]  [<0000000000b9e270>] system_call+0xdc/0x2d8
> 
> 
> Now trying to answer your question, I think reseting the device on a 
> release makes more sense. This is because the mediated device still 
> exists but there is no user using the device.
> 
> On the remove case the mediated device is going away for good and so 
> maybe just disabling the subchannel makes more sense in that case.

Thanks for the backtraces; and yes, I agree that reset on release and
quiesce on remove look reasonable.

> 
> Thanks
> Farhan
> 
> 
> >   
> >> +
> >> +	cp_free(&private->cp);
> >>   	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> >>   				 &private->nb);
> >>     
> > 
> >   
> 




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux