On Wednesday, February 21, 2024 1:03 AM, Jason Gunthorpe wrote: > On Tue, Feb 20, 2024 at 03:53:08PM +0000, Zeng, Xin wrote: > > On Tuesday, February 20, 2024 9:25 PM, Jason Gunthorpe wrote: > > > To: Zeng, Xin <xin.zeng@xxxxxxxxx> > > > Cc: Yishai Hadas <yishaih@xxxxxxxxxx>; herbert@xxxxxxxxxxxxxxxxxxx; > > > alex.williamson@xxxxxxxxxx; shameerali.kolothum.thodi@xxxxxxxxxx; > Tian, > > > Kevin <kevin.tian@xxxxxxxxx>; linux-crypto@xxxxxxxxxxxxxxx; > > > kvm@xxxxxxxxxxxxxxx; qat-linux <qat-linux@xxxxxxxxx>; Cao, Yahui > > > <yahui.cao@xxxxxxxxx> > > > Subject: Re: [PATCH 10/10] vfio/qat: Add vfio_pci driver for Intel QAT VF > devices > > > > > > On Sat, Feb 17, 2024 at 04:20:20PM +0000, Zeng, Xin wrote: > > > > > > > Thanks for this information, but this flow is not clear to me why it > > > > cause deadlock. From this flow, CPU0 is not waiting for any resource > > > > held by CPU1, so after CPU0 releases mmap_lock, CPU1 can continue > > > > to run. Am I missing something? > > > > > > At some point it was calling copy_to_user() under the state > > > mutex. These days it doesn't. > > > > > > copy_to_user() would nest the mm_lock under the state mutex which is > a > > > locking inversion. > > > > > > So I wonder if we still have this problem now that the copy_to_user() > > > is not under the mutex? > > > > In protocol v2, we still have the scenario in precopy_ioctl where > copy_to_user is > > called under state_mutex. > > Why? Does mlx5 do that? It looked Ok to me: > > mlx5vf_state_mutex_unlock(mvdev); > if (copy_to_user((void __user *)arg, &info, minsz)) > return -EFAULT; Indeed, thanks, Jason. BTW, is there any reason why was "deferred_reset" mode still implemented in mlx5 driver given this deadlock condition has been avoided with migration protocol v2 implementation. Anyway, I'll use state_mutex directly instead of the "deferred_reset" mode in qat variant driver and update it in next version soon, please help to review. Thanks, Xin