Re: [PATCH 0/9] vhost: Support SIGKILL by flushing and exiting

Jason Wang <jasowang@xxxxxxxxxx> · Fri, 12 Apr 2024 11:28:13 +0800

On Fri, Apr 12, 2024 at 12:19 AM Mike Christie
<michael.christie@xxxxxxxxxx> wrote:
>
> On 4/11/24 3:39 AM, Jason Wang wrote:
> > On Sat, Mar 16, 2024 at 8:47 AM Mike Christie
> > <michael.christie@xxxxxxxxxx> wrote:
> >>
> >> The following patches were made over Linus's tree and also apply over
> >> mst's vhost branch. The patches add the ability for vhost_tasks to
> >> handle SIGKILL by flushing queued works, stop new works from being
> >> queued, and prepare the task for an early exit.
> >>
> >> This removes the need for the signal/coredump hacks added in:
> >>
> >> Commit f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")
> >>
> >> when the vhost_task patches were initially merged and fix the issue
> >> in this thread:
> >>
> >> https://lore.kernel.org/all/000000000000a41b82060e875721@xxxxxxxxxx/
> >>
> >> Long Background:
> >>
> >> The original vhost worker code didn't support any signals. If the
> >> userspace application that owned the worker got a SIGKILL, the app/
> >> process would exit dropping all references to the device and then the
> >> file operation's release function would be called. From there we would
> >> wait on running IO then cleanup the device's memory.
> >
> > A dumb question.
> >
> > Is this a user space noticeable change? For example, with this series
> > a SIGKILL may shutdown the datapath ...
>
> It already changed in 6.4. We basically added a new interface to shutdown
> everything (userspace and vhost kernel parts). So we won't just shutdown
> the data path while userspace is still running. We will shutdown everything
> now if you send a SIGKILL to a vhost worker's thread.

If I understand correctly, for example Qemu can still live is SIGKILL
is just send to vhost thread.

If this is correct, guests may detect this (for example virtio-net has
a watchdog).

>
> Here are a lots of details:
>
> - Pre-6.4 kernel, when vhost workers used kthreads, if you sent any signal
> to a vhost worker, we ignore it. Nothing happens. kthreads are special and
> can ignore all signals.
>
> You could think of it as the worker is a completely different process than
> qemu/userspace so they have completely different signal handlers. The
> vhost worker signal handler ignores all signals even SIGKILL.

Yes.

>
> If you send a SIGKILL to a qemu thread, then it just exits right away. We
> don't get to do an explicit close() on the vhost device and we don't get
> to do ioctls like VHOST_NET_SET_BACKEND to clear backends. The kernel exit
> code runs and releases refcounts on the device/file, then the vhost device's
> file_operations->release function is called. vhost_dev_cleanup then stops
> the vhost worker.

Right.

>
> - In 6.4 and newer kernels, vhost workers use vhost_tasks, so the worker
> can be thought of as a thread within the userspace process. With that
> change we have the same signal handler as the userspace process.
>
> If you send a SIGKILL to a qemu thread then it works like above.
>
> If you send a SIGKILL to a vhost worker, the vhost worker still sort of
> ignores it (that is the hack that I mentioned at the beginning of this
> thread). kernel/vhost_task.c:vhost_task_fn will see the signal and
> then just continue to process works until file_operations->release
> calls

Yes, so this sticks to the behaviour before vhost_tasks.

>
> However, the change in behavior is that because the worker is just a
> thread within qemu, qemu is going to exit since they share a signal
> handler and userspace can't ignore SIGKILL.
> We then run perform the
> steps above like in the pre-6.4 kernel description as if you sent a
> SIGKILL directly to a userspace thread.
>
> - With the patches in this thread there is no major difference in behavior
> with 6.4 and newer kernels. We might exit a little faster. Instead of the
> vhost thread trying to do it's hacked up version of ignoring the signal
> and waiting for userspace to exit and call file_operations->release, the
> vhost worker thread will exit after it flushes works and stops new ones.
> So now, you can have the vhost thread exiting in parallel with the
> userspace thread.

Everything would be fine if Qemu wanted to quit but I meant there
could be a case where SIGKILL was sent to the vhost thread but not
Qemu.

Thanks

>