> I see now and can replicate it. I think there is a 2nd bug in > vhost_scsi_set_endpoint related to all this where we need to > prevent switching targets like this or else we'll leak some > other refcounts. If 500140501e23be28's tpg number was 3 then > we would overwrite the existing vs->vs_vhost_wwpn and never > be able to release the refounts on the tpgs from 500140562c8936fa. > > I'll send a patchset to fix everything and cc you. > > Thanks for all the work you did testing and debugging this > issue. You are welcome. There is another bug I was about to report, but I'm not sure whether I should create a new thread. I feel that the original design of dynamically allocating new vs_tpgs in vhost_scsi_set_endpoint is not intuitive, and copying TPGs before setting the target doesn't seem logical. Since you are already refactoring the code, maybe I should post it here so we can address these issues in one go. [PATCH] vhost/scsi: Fix dangling pointer in vhost_scsi_set_endpoint() Since commit 4f7f46d32c98 ("tcm_vhost: Use vq->private_data to indicate if the endpoint is setup"), a dangling pointer issue has been introduced in vhost_scsi_set_endpoint() when the host fails to reconfigure the vhost-scsi endpoint. Specifically, this causes a UAF fault in vhost_scsi_get_req() when the guest attempts to send an SCSI request. In vhost_scsi_set_endpoint(), vhost_scsi->vs_tpg holds the same pointer as vq->private_data. The code allocates a new vs_tpg array to hold the TPGs and updates vq->private_data if a match is found between vhost_scsi_list's tport_name and the target WWPN. However, the code ignored the case where `match == 0` (i.e. the target endpoint does not match any TPGs in vhost_scsi_list). In this scenario, it directly frees the old vs_tpg and updates vhost_scsi->vs_tpg without modifying vq->private_data, leaving all vq's backend pointer dangling. As a result, subsequent requests from the guest will trigger a UAF fault on vs_tpg in vhost_scsi_get_req(). Below is the KASAN report: [ 68.606821] BUG: KASAN: slab-use-after-free in vhost_scsi_get_req+0xef/0x1f0 [ 68.607671] Read of size 8 at addr ffff8880087a1008 by task vhost-1440/1460 [ 68.608570] [ 68.612070] Call Trace: [ 68.612429] <TASK> [ 68.612739] dump_stack_lvl+0x9e/0xd0 [ 68.613206] print_report+0xd1/0x670 [ 68.613711] ? __virt_addr_valid+0x54/0x250 [ 68.614232] ? kasan_complete_mode_report_info+0x6a/0x200 [ 68.614879] kasan_report+0xd6/0x120 [ 68.615329] ? vhost_scsi_get_req+0xef/0x1f0 [ 68.615869] ? vhost_scsi_get_req+0xef/0x1f0 [ 68.616406] __asan_load8+0x8b/0xe0 [ 68.616854] vhost_scsi_get_req+0xef/0x1f0 [ 68.617362] vhost_scsi_handle_vq+0x30f/0x15e0 [ 68.622248] ... [ 68.622868] vhost_scsi_handle_kick+0x39/0x50 [ 68.623409] vhost_run_work_list+0xd9/0x120 [ 68.623939] ? __pfx_vhost_run_work_list+0x10/0x10 [ 68.624521] vhost_task_fn+0xf8/0x240 [ 68.629164] ... [ 68.629705] ret_from_fork+0x5d/0x80 [ 68.630155] ? __pfx_vhost_task_fn+0x10/0x10 [ 68.630693] ret_from_fork_asm+0x1a/0x30 [ 68.631179] RIP: 0033:0x0 [ 68.631516] Code: Unable to access opcode bytes at 0xffffffffffffffd6. [ 68.637405] </TASK> [ 68.637670] [ 68.637798] Allocated by task 1440: [ 68.638124] kasan_save_stack+0x39/0x70 [ 68.638607] kasan_save_track+0x14/0x40 [ 68.638919] kasan_save_alloc_info+0x37/0x60 [ 68.639331] __kasan_kmalloc+0xc3/0xd0 [ 68.639625] __kmalloc_cache_noprof+0x186/0x3a0 [ 68.639971] vhost_scsi_ioctl+0x630/0xec0 [ 68.640392] __x64_sys_ioctl+0x126/0x160 [ 68.640755] x64_sys_call+0x11ad/0x25f0 [ 68.641076] do_syscall_64+0x7e/0x170 [ 68.641437] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 68.641887] [ 68.642016] Freed by task 1461: [ 68.642301] kasan_save_stack+0x39/0x70 [ 68.642622] kasan_save_track+0x14/0x40 [ 68.642958] kasan_save_free_info+0x3b/0x60 [ 68.643352] __kasan_slab_free+0x52/0x80 [ 68.643675] kfree+0x129/0x440 [ 68.643973] vhost_scsi_ioctl+0xd74/0xec0 [ 68.644290] __x64_sys_ioctl+0x126/0x160 [ 68.644589] x64_sys_call+0x11ad/0x25f0 [ 68.644939] do_syscall_64+0x7e/0x170 [ 68.645377] entry_SYSCALL_64_after_hwframe+0x76/0x7e To address this issue, we need to prevent the `free(vs_tpg)` operation from being triggered by the `match == 0` branch , either by moving the free operation inside the if-clause or by adding a goto statement in the else-clause. Here is an alternative patch commit. Fixes: 4f7f46d32c98 ("tcm_vhost: Use vq->private_data to indicate if the endpoint is setup") Signed-off-by: Haoran Zhang <wh1sper@xxxxxxxxxx> --- drivers/vhost/scsi.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 718fa4e0b31e..1e15eab530d7 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -1775,6 +1775,7 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs, ret = 0; } else { ret = -EEXIST; + goto undepend; } /* -- 2.43.0