On Sat, Mar 16, 2024 at 8:47 AM Mike Christie <michael.christie@xxxxxxxxxx> wrote: > > The following patches were made over Linus's tree and also apply over > mst's vhost branch. The patches add the ability for vhost_tasks to > handle SIGKILL by flushing queued works, stop new works from being > queued, and prepare the task for an early exit. > > This removes the need for the signal/coredump hacks added in: > > Commit f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression") > > when the vhost_task patches were initially merged and fix the issue > in this thread: > > https://lore.kernel.org/all/000000000000a41b82060e875721@xxxxxxxxxx/ > > Long Background: > > The original vhost worker code didn't support any signals. If the > userspace application that owned the worker got a SIGKILL, the app/ > process would exit dropping all references to the device and then the > file operation's release function would be called. From there we would > wait on running IO then cleanup the device's memory. A dumb question. Is this a user space noticeable change? For example, with this series a SIGKILL may shutdown the datapath ... Thanks > > When we switched to vhost_tasks being a thread in the owner's process we > added some hacks to the signal/coredump code so we could continue to > wait on running IO and process it from the vhost_task. The idea was that > we would eventually remove the hacks. We recently hit this bug: > > https://lore.kernel.org/all/000000000000a41b82060e875721@xxxxxxxxxx/ > > It turns out only vhost-scsi had an issue where it would send a command > to the block/LIO layer, wait for a response and then process in the vhost > task. So patches 1-5 prepares vhost-scsi to handle when the vhost_task > is killed while we still have commands outstanding. The next patches then > prepare and convert the vhost and vhost_task layers to handle SIGKILL > by flushing running works, marking the vhost_task as dead so there's > no future uses, then exiting. > > >