On Tue, Apr 09, 2024 at 12:16:36PM +0800, Jason Wang wrote: > On Sat, Mar 16, 2024 at 8:47 AM Mike Christie > <michael.christie@xxxxxxxxxx> wrote: > > > > The following patches were made over Linus's tree and also apply over > > mst's vhost branch. The patches add the ability for vhost_tasks to > > handle SIGKILL by flushing queued works, stop new works from being > > queued, and prepare the task for an early exit. > > > > This removes the need for the signal/coredump hacks added in: > > > > Commit f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression") > > > > when the vhost_task patches were initially merged and fix the issue > > in this thread: > > > > https://lore.kernel.org/all/000000000000a41b82060e875721@xxxxxxxxxx/ > > > > Long Background: > > > > The original vhost worker code didn't support any signals. If the > > userspace application that owned the worker got a SIGKILL, the app/ > > process would exit dropping all references to the device and then the > > file operation's release function would be called. From there we would > > wait on running IO then cleanup the device's memory. > > > > When we switched to vhost_tasks being a thread in the owner's process we > > added some hacks to the signal/coredump code so we could continue to > > wait on running IO and process it from the vhost_task. The idea was that > > we would eventually remove the hacks. We recently hit this bug: > > > > https://lore.kernel.org/all/000000000000a41b82060e875721@xxxxxxxxxx/ > > > > It turns out only vhost-scsi had an issue where it would send a command > > to the block/LIO layer, wait for a response and then process in the vhost > > task. > > Vhost-net TX zerocopy code did the same: > > It sends zerocopy packets to the under layer and waits for the > underlayer. When the DMA is completed, vhost_zerocopy_callback will be > called to schedule vq work for used ring updating. Yea. It's still experimental though so I'm not sure how stressed to be about it. I guess we can ignore it for now - but yes it was one of the big issues with tx zerocopy and this patchset opens the path to productizing it. > > So patches 1-5 prepares vhost-scsi to handle when the vhost_task > > is killed while we still have commands outstanding. The next patches then > > prepare and convert the vhost and vhost_task layers to handle SIGKILL > > by flushing running works, marking the vhost_task as dead so there's > > no future uses, then exiting. > > Thanks > > > > > > >