Sorry for the late reply On 25/06/2022 07:47, Jason Gunthorpe wrote: > On Fri, Jun 24, 2022 at 04:26:06PM -0700, Bart Van Assche wrote: >> On 6/24/22 15:59, Jason Gunthorpe wrote: >>> I don't even understand how get_device() prevents this call chain?? >>> >>> It looks to me like the problem is srp_remove_one() is not waiting for >>> or canceling some outstanding work. >> Hi Jason, >> >> My conclusions from the call traces in Li's email are as follows: >> * scsi_host_dev_release() can get called after srp_remove_one(). >> * srp_exit_cmd_priv() uses the ib_device pointer. If srp_remove_one() is >> called before srp_exit_cmd_priv() then a use-after-free is triggered. > Shouldn't srp_remove_one() wait for the scsi_host_dev to complete > destruction? Clearly it cannot continue to exist once the IB device > has been removed Yes, that match my first thought, but i didn't know the exact way to notify scsi side to destroy itself but scsi_host_put() which already called once in below chains: srp_remove_one() -> srp_queue_remove_work() -> srp_remove_target() -> scsi_remove_host() -> scsi_host_put() that means scsi_host_dev is still referenced by other components that we have to notify. > >> Is calling get_device() and put_device() on the struct ib_device an >> acceptable way to fix this? > As I said, I don't understand at all how this works. get_device() does > not prevent srp_remove_one() from being called. I originally thought that srp_remove_one was called from put_device(ibdev) , so increase its ref_count can avoid it being released early. Thanks Zhijian > Jason