Qemu-kvm will create several kernel threads for each VM including kvm-nx-lpage-re, vhost, and so on. Both of them properly inherit the cgroups of the calling process,so they are easy to attach to the VMM process's cgroups. Kubernetes has a feature Pod Overhead for accounting for the resources consumed by the Pod infrastructure(e.g overhead brought by qemu-kvm), and sandbox container runtime usually creates a sandbox or sandbox overhead cgroup for this feature. By just simply adding the runtime or the VMM process to the sandbox's cgroup, vhost and kvm-nx-lpage-re thread can successfully attach to the sanbox's cgroup but kvm-pit thread cannot. Besides, in some scenarios, kvm-pit thread can bring some CPU overhead. So it's better to let the kvm-pit inherit the cgroups of the calling userspace process. By queuing the attach cgroup work as the first work after the creation of the kvm-pit worker thread, the worker thread can successfully attach to the callings process's cgroups. Signed-off-by: Jietao Xiao <shawtao1125@xxxxxxxxx> --- arch/x86/kvm/i8254.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 0b65a764ed3a..c8dcfd6a9ed4 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -34,6 +34,7 @@ #include <linux/kvm_host.h> #include <linux/slab.h> +#include <linux/cgroup.h> #include "ioapic.h" #include "irq.h" @@ -647,6 +648,32 @@ static void pit_mask_notifer(struct kvm_irq_mask_notifier *kimn, bool mask) kvm_pit_reset_reinject(pit); } +struct pit_attach_cgroups_struct { + struct kthread_work work; + struct task_struct *owner; + int ret; +}; + +static void pit_attach_cgroups_work(struct kthread_work *work) +{ + struct pit_attach_cgroups_struct *attach; + + attach = container_of(work, struct pit_attach_cgroups_struct, work); + attach->ret = cgroup_attach_task_all(attach->owner, current); +} + + +static int pit_attach_cgroups(struct kvm_pit *pit) +{ + struct pit_attach_cgroups_struct attach; + + attach.owner = current; + kthread_init_work(&attach.work, pit_attach_cgroups_work); + kthread_queue_work(pit->worker, &attach.work); + kthread_flush_work(&attach.work); + return attach.ret; +} + static const struct kvm_io_device_ops pit_dev_ops = { .read = pit_ioport_read, .write = pit_ioport_write, @@ -683,6 +710,10 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags) if (IS_ERR(pit->worker)) goto fail_kthread; + ret = pit_attach_cgroups(pit); + if (ret < 0) + goto fail_attach_cgroups; + kthread_init_work(&pit->expired, pit_do_work); pit->kvm = kvm; @@ -723,6 +754,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags) fail_register_pit: mutex_unlock(&kvm->slots_lock); kvm_pit_set_reinject(pit, false); +fail_attach_cgroups: kthread_destroy_worker(pit->worker); fail_kthread: kvm_free_irq_source_id(kvm, pit->irq_source_id); -- 2.20.1