On Fri, Apr 12, 2024 at 12:19 AM Mike Christie <michael.christie@xxxxxxxxxx> wrote: > > On 4/11/24 3:39 AM, Jason Wang wrote: > > On Sat, Mar 16, 2024 at 8:47 AM Mike Christie > > <michael.christie@xxxxxxxxxx> wrote: > >> > >> The following patches were made over Linus's tree and also apply over > >> mst's vhost branch. The patches add the ability for vhost_tasks to > >> handle SIGKILL by flushing queued works, stop new works from being > >> queued, and prepare the task for an early exit. > >> > >> This removes the need for the signal/coredump hacks added in: > >> > >> Commit f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression") > >> > >> when the vhost_task patches were initially merged and fix the issue > >> in this thread: > >> > >> https://lore.kernel.org/all/000000000000a41b82060e875721@xxxxxxxxxx/ > >> > >> Long Background: > >> > >> The original vhost worker code didn't support any signals. If the > >> userspace application that owned the worker got a SIGKILL, the app/ > >> process would exit dropping all references to the device and then the > >> file operation's release function would be called. From there we would > >> wait on running IO then cleanup the device's memory. > > > > A dumb question. > > > > Is this a user space noticeable change? For example, with this series > > a SIGKILL may shutdown the datapath ... > > It already changed in 6.4. We basically added a new interface to shutdown > everything (userspace and vhost kernel parts). So we won't just shutdown > the data path while userspace is still running. We will shutdown everything > now if you send a SIGKILL to a vhost worker's thread. If I understand correctly, for example Qemu can still live is SIGKILL is just send to vhost thread. If this is correct, guests may detect this (for example virtio-net has a watchdog). > > Here are a lots of details: > > - Pre-6.4 kernel, when vhost workers used kthreads, if you sent any signal > to a vhost worker, we ignore it. Nothing happens. kthreads are special and > can ignore all signals. > > You could think of it as the worker is a completely different process than > qemu/userspace so they have completely different signal handlers. The > vhost worker signal handler ignores all signals even SIGKILL. Yes. > > If you send a SIGKILL to a qemu thread, then it just exits right away. We > don't get to do an explicit close() on the vhost device and we don't get > to do ioctls like VHOST_NET_SET_BACKEND to clear backends. The kernel exit > code runs and releases refcounts on the device/file, then the vhost device's > file_operations->release function is called. vhost_dev_cleanup then stops > the vhost worker. Right. > > - In 6.4 and newer kernels, vhost workers use vhost_tasks, so the worker > can be thought of as a thread within the userspace process. With that > change we have the same signal handler as the userspace process. > > If you send a SIGKILL to a qemu thread then it works like above. > > If you send a SIGKILL to a vhost worker, the vhost worker still sort of > ignores it (that is the hack that I mentioned at the beginning of this > thread). kernel/vhost_task.c:vhost_task_fn will see the signal and > then just continue to process works until file_operations->release > calls Yes, so this sticks to the behaviour before vhost_tasks. > > However, the change in behavior is that because the worker is just a > thread within qemu, qemu is going to exit since they share a signal > handler and userspace can't ignore SIGKILL. > We then run perform the > steps above like in the pre-6.4 kernel description as if you sent a > SIGKILL directly to a userspace thread. > > - With the patches in this thread there is no major difference in behavior > with 6.4 and newer kernels. We might exit a little faster. Instead of the > vhost thread trying to do it's hacked up version of ignoring the signal > and waiting for userspace to exit and call file_operations->release, the > vhost worker thread will exit after it flushes works and stops new ones. > So now, you can have the vhost thread exiting in parallel with the > userspace thread. Everything would be fine if Qemu wanted to quit but I meant there could be a case where SIGKILL was sent to the vhost thread but not Qemu. Thanks >