Re: [PATCH 0/9] vhost: Support SIGKILL by flushing and exiting

Mike Christie <michael.christie@xxxxxxxxxx> · Thu, 11 Apr 2024 11:19:12 -0500

On 4/11/24 3:39 AM, Jason Wang wrote:
> On Sat, Mar 16, 2024 at 8:47 AM Mike Christie
> <michael.christie@xxxxxxxxxx> wrote:
>>
>> The following patches were made over Linus's tree and also apply over
>> mst's vhost branch. The patches add the ability for vhost_tasks to
>> handle SIGKILL by flushing queued works, stop new works from being
>> queued, and prepare the task for an early exit.
>>
>> This removes the need for the signal/coredump hacks added in:
>>
>> Commit f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")
>>
>> when the vhost_task patches were initially merged and fix the issue
>> in this thread:
>>
>> https://lore.kernel.org/all/000000000000a41b82060e875721@xxxxxxxxxx/
>>
>> Long Background:
>>
>> The original vhost worker code didn't support any signals. If the
>> userspace application that owned the worker got a SIGKILL, the app/
>> process would exit dropping all references to the device and then the
>> file operation's release function would be called. From there we would
>> wait on running IO then cleanup the device's memory.
> 
> A dumb question.
> 
> Is this a user space noticeable change? For example, with this series
> a SIGKILL may shutdown the datapath ...

It already changed in 6.4. We basically added a new interface to shutdown
everything (userspace and vhost kernel parts). So we won't just shutdown
the data path while userspace is still running. We will shutdown everything
now if you send a SIGKILL to a vhost worker's thread.

Here are a lots of details:

- Pre-6.4 kernel, when vhost workers used kthreads, if you sent any signal
to a vhost worker, we ignore it. Nothing happens. kthreads are special and
can ignore all signals.

You could think of it as the worker is a completely different process than
qemu/userspace so they have completely different signal handlers. The
vhost worker signal handler ignores all signals even SIGKILL.

If you send a SIGKILL to a qemu thread, then it just exits right away. We
don't get to do an explicit close() on the vhost device and we don't get
to do ioctls like VHOST_NET_SET_BACKEND to clear backends. The kernel exit
code runs and releases refcounts on the device/file, then the vhost device's
file_operations->release function is called. vhost_dev_cleanup then stops
the vhost worker.

- In 6.4 and newer kernels, vhost workers use vhost_tasks, so the worker
can be thought of as a thread within the userspace process. With that
change we have the same signal handler as the userspace process.

If you send a SIGKILL to a qemu thread then it works like above.

If you send a SIGKILL to a vhost worker, the vhost worker still sort of
ignores it (that is the hack that I mentioned at the beginning of this
thread). kernel/vhost_task.c:vhost_task_fn will see the signal and
then just continue to process works until file_operations->release
calls 

However, the change in behavior is that because the worker is just a
thread within qemu, qemu is going to exit since they share a signal
handler and userspace can't ignore SIGKILL. We then run perform the
steps above like in the pre-6.4 kernel description as if you sent a
SIGKILL directly to a userspace thread.

- With the patches in this thread there is no major difference in behavior
with 6.4 and newer kernels. We might exit a little faster. Instead of the
vhost thread trying to do it's hacked up version of ignoring the signal
and waiting for userspace to exit and call file_operations->release, the
vhost worker thread will exit after it flushes works and stops new ones.
So now, you can have the vhost thread exiting in parallel with the
userspace thread.