On 1/10/25 9:34 PM, Haoran Zhang wrote: > Since commit 3f8ca2e115e55 ("vhost scsi: alloc cmds per vq instead of session"), a bug can be triggered when the host sends a duplicate VHOST_SCSI_SET_ENDPOINT ioctl command. I don't think that git commit is correct. It should be: 25b98b64e284 ("vhost scsi: alloc cmds per vq instead of session") > diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c > index 718fa4e0b31e..b994138837f2 100644 > --- a/drivers/vhost/scsi.c > +++ b/drivers/vhost/scsi.c > @@ -1726,7 +1726,7 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs, > mutex_unlock(&tpg->tv_tpg_mutex); > mutex_unlock(&vhost_scsi_mutex); > ret = -EEXIST; > - goto undepend; > + goto free_vs_tpg; I agree you found a bug, but I'm not sure how you hit it from here. A couple lines before this, we do: if (tpg->tv_tpg_vhost_count != 0) { mutex_unlock(&tpg->tv_tpg_mutex); continue; } If the tpg was already in the vs_tpg, then tv_tpg_vhost_count would be non-zero and we would hit the check above. Could you describe the target and tpg mapping for how you hit this? > } > /* > * In order to ensure individual vhost-scsi configfs However, I was able to replicate the bug by hitting the chunk below this comment where we do: ret = target_depend_item(&se_tpg->tpg_group.cg_item); if (ret) { pr_warn("target_depend_item() failed: %d\n", ret); mutex_unlock(&tpg->tv_tpg_mutex); mutex_unlock(&vhost_scsi_mutex); goto undepend; > @@ -1802,6 +1802,7 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs, > target_undepend_item(&tpg->se_tpg.tpg_group.cg_item); > } > } > +free_vs_tpg: > kfree(vs_tpg); To fix the bug, I don't think we can just free the vs_tpg. There's 2 cases we can hit the error path: 1. First time calling vhost_scsi_set_endpoint. If we have a target with 2 tpgs, and for tpg1 we did a successful target_depend_item call, but then we looped and for tpg2 target_depend_item failed, if we just did a kfree then we would leave a refcount on tpg1. So for this case, we need to do the "goto undepend". 2. N > 1 time calling vhost_scsi_set_endpoint. This one is more complicated because let's say we started with 1 tpg and on the first call to vhost_scsi_set_endpoint we successfully did target_depend_item on it. Before the 2nd call to vhost_scsi_set_endpoint we added tpg2 and tpg3. We then do vhost_scsi_set_endpoint for the 2nd time, we successfully do target_depend_item on tpg2, but it fails for tpg3. In this case, we want to unwind what we did on this 2nd call, so we want to do target_undepend_item on tpg2. And, we don't want to call it for tpg1 or we will hit the bug you found. So I think to fix the issue, we would want to: 1. move the memcpy(vs_tpg, vs->vs_tpg, len); to the end of the function after we do the vhost_scsi_flush. This will be more complicated than the current memcpy though. We will want to merge the local vs_tpg and the vs->vs_tpg like: for (i = 0; i < VHOST_SCSI_MAX_TARGET; i++) { if (vs_tpg[i]) vs->vs_tpg[i] = vs_tpg[i]) } 2. We want to leave the "goto undepend" calls as is. For the the undepend goto handling we also want to leave the code as is. We want to continue to loop over the local vs_tpg because after we moved the memcpy for #1 it now only contains the tpgs we updated on the current vhost_scsi_set_endpoint call.