On 5/23/19 9:22 AM, Daniel P. Berrangé wrote:
On Wed, May 22, 2019 at 05:16:38PM -0600, Jim Fehlig wrote:Hi All, I recently received an internal bug report of VM "crashing" due to hitting thread limits. Seems there was an assert in pthread_create within the VM when hitting the limit enforced by pids controller on the host Apr 28 07:45:46 lpcomp02007 kernel: cgroup: fork rejected by pids controller in /machine.slice/machine-qemu\x2d90028\x2dinstance\x2d0000634b.scope The user has TasksMax set to infinity in machine.slice, but apparently that is not inherited by child scopes and appears to be hardcoded to 16384 https://github.com/systemd/systemd/blob/51aba17b88617515e037e8985d3a4ea871ac47fe/src/machine/machined-dbus.c#L1344 The TasksMax property can be set when creating the machine as is done in the attached proof of concept patch. Question is whether this should be a tunable? My initial thought when seeing the report was TasksMax could be calculated based on number of vcpus, iothreads, emulator threads, etc. But it appears that could be quite tricky. The following mail thread describes the basic scenario encountered by my user http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008174.html As you can see, many rbd images attached to a VM can result in an awful lot of threads. 300 images could result in 720K threads! We could punt and set the limit to infinity, but it exists for a reason - fork bomb prevention. A potential compromise between a hardcoded value and per-VM tunable is a driver tunable in qemu.conf. If a per-VM tunable is preferred, suggestions on where to place it and what to call it would be much appreciated :-).Yeah, RBD is problematic as you can't predict how many threads it will use. We currently have a "max_processes" stting in qemu.conf for the ulimit base process limit. This applies to the user as a whole though, not the cgroup. On Fedora we don't seem to have any "tasks_max" cgroup setting or TasksMax systemd setting, at least when running with cgroups v1, so we can't set that unconditionally.
AFAICT, the TasksMax scope property maps to pids.max in pids controller hierarchy. E.g. with the hardcoded 32k value in the POC patch
# cat /sys/fs/cgroup/pids/machine.slice/machine-qemu\\x2d2\\x2dsles15.scope/pids.max 32768 Regards, Jim
I'd be inclined to have a new qemu.conf setting "max_tasks". If this is set to 0, then we should just set TasksMax to infinity, otherwise honour the setting.>From 0583ee3b26b2ee43efe8d25226eceb8547400d97 Mon Sep 17 00:00:00 2001 From: Jim Fehlig <jfehlig@xxxxxxxx> Date: Wed, 22 May 2019 17:12:14 -0600 Subject: [PATCH] systemd: set TasksMax when calling CreateMachine An example of how to set TasksMax when creating a scope for a machine. Signed-off-by: Jim Fehlig <jfehlig@xxxxxxxx> --- src/util/virsystemd.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/src/util/virsystemd.c b/src/util/virsystemd.c index 3f03e3bd63..6177447bdb 100644 --- a/src/util/virsystemd.c +++ b/src/util/virsystemd.c @@ -341,10 +341,11 @@ int virSystemdCreateMachine(const char *name, (unsigned int)pidleader, NULLSTR_EMPTY(rootdir), nnicindexes, nicindexes, - 3, + 4, "Slice", "s", slicename, "After", "as", 1, "libvirtd.service", - "Before", "as", 1, "virt-guest-shutdown.target") < 0) + "Before", "as", 1, "virt-guest-shutdown.target", + "TasksMax", "t", UINT64_C(32768)) < 0) goto cleanup;if (error.level == VIR_ERR_ERROR) {@@ -382,10 +383,11 @@ int virSystemdCreateMachine(const char *name, iscontainer ? "container" : "vm", (unsigned int)pidleader, NULLSTR_EMPTY(rootdir), - 3, + 4, "Slice", "s", slicename, "After", "as", 1, "libvirtd.service", - "Before", "as", 1, "virt-guest-shutdown.target") < 0) + "Before", "as", 1, "virt-guest-shutdown.target", + "TasksMax", "t", UINT64_C(32768)) < 0) goto cleanup; }--2.21.0-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-listRegards, Daniel
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list