On 6/24/22 15:59, Jason Gunthorpe wrote:
I don't even understand how get_device() prevents this call chain??
It looks to me like the problem is srp_remove_one() is not waiting for
or canceling some outstanding work.
Hi Jason,
My conclusions from the call traces in Li's email are as follows:
* scsi_host_dev_release() can get called after srp_remove_one().
* srp_exit_cmd_priv() uses the ib_device pointer. If srp_remove_one() is
called before srp_exit_cmd_priv() then a use-after-free is triggered.
Is calling get_device() and put_device() on the struct ib_device an
acceptable way to fix this? If so, I recommend to insert a get_device()
call after the scsi_add_host() call and put_device() calls after the two
scsi_remove_host() calls instead of merging the patch at the start of
this email thread.
Thanks,
Bart.