On Tue, Apr 09, 2024 at 04:55:25PM -0500, michael.christie@xxxxxxxxxx wrote: > On 4/9/24 11:40 AM, Michael S. Tsirkin wrote: > > On Tue, Apr 09, 2024 at 09:57:00AM -0500, Mike Christie wrote: > >> On 4/8/24 11:16 PM, Jason Wang wrote: > >>>> It turns out only vhost-scsi had an issue where it would send a command > >>>> to the block/LIO layer, wait for a response and then process in the vhost > >>>> task. > >>> Vhost-net TX zerocopy code did the same: > >>> > >>> It sends zerocopy packets to the under layer and waits for the > >>> underlayer. When the DMA is completed, vhost_zerocopy_callback will be > >>> called to schedule vq work for used ring updating. > >> > >> I think it might be slightly different and vhost-net is ok. > >> > >> Above I meant, vhost-scsi would do the equivalent of: > >> > >> vhost_zerocopy_callback -> vhost_net_ubuf_put > >> > >> from vhost_task/thread. > >> > >> For vhost-net then, if we get an early exit via a signal the patches will flush > >> queued works and prevent new ones from being queued. vhost_net_release will run > >> due to the process/thread exit code releasing it's refcounts on the device. > >> > >> vhost_net_release will then do: > >> > >> vhost_net_flush -> vhost_net_ubuf_put_and_wait > >> > >> and wait for vhost_zerocopy_callback -> vhost_net_ubuf_put calls. > >> > >> For vhost-scsi we would hang. We would do our equivalent of vhost_net_ubuf_put > >> from the vhost task/thread. So, when our release function, vhost_scsi_release, > >> is called we will also do a similar wait. However, because the task/thread is > >> killed, we will hang there since the "put" will never be done. > > > > If you can find a way to cancel them instead of flushing out, > > I think it would be better for net, too. > > I don't think canceling block requests will be possible any time soon. > It's one of those things that people have wanted for decades but it gets > messy when you are dealing multiple layers (fs, md/md, request queue, > device driver, etc) that some don't even use the same struct sometimes > (bios vs requests) and you don't want to hurt performance. > > So for vhost-scsi we can't cancel running block layer IOs because there > is no interface for it. We have to sit around and wait for them to > complete or timeout. > > If you are saying cancel at just the vhost/vhost-net/vhost-scsi layers > then I can do that. However, I think it might be messier. I think that'd be kind of pointless ... > We would clean > up some of the vhost related resource, then return from the > file_operations->release functions. Then we would have to have some > workqueue that just waits for driver specific responses (block layer > responses for vhost-scsi). When it gets them, then it does "puts" on > it's structs and eventually they are freed.