Hi, On Mon, May 11, 2009 at 10:26 AM, Ryota Ozaki <ozaki.ryota@xxxxxxxxx> wrote: > Hi, > > On Sun, May 10, 2009 at 8:49 AM, Ryota Ozaki <ozaki.ryota@xxxxxxxxx> wrote: >> Hi Serge and Daniel, >> >> On Sat, May 9, 2009 at 4:03 AM, Serge E. Hallyn <serue@xxxxxxxxxx> wrote: >>> Quoting Daniel P. Berrange (berrange@xxxxxxxxxx): >>>> On Fri, May 08, 2009 at 08:34:12AM -0500, Serge E. Hallyn wrote: >>>> > Quoting Ryota Ozaki (ozaki.ryota@xxxxxxxxx): >>>> > > Hi Serge, >>>> > > >>>> > > On Fri, May 8, 2009 at 11:48 AM, Serge E. Hallyn <serue@xxxxxxxxxx> wrote: >>>> > > > IIUC, the real problem is that src/cgroup.c assumes that the >>>> > > > cgroup name should be $CGROUP_MOUNTPOINT/groupname. But of >>>> > > > course if the ns cgroup is enabled, then the unshare(CLONE_NEWNS) >>>> > > > to create a new namespace in which to mount the new devpts >>>> > > > locks the driver under $CGROUP_MOUNTPOINT/<pid_of_driver>/ >>>> > > > or somesuch. >>>> > > > >>>> > > > If this fixes the problem I have no objections, but it seems >>>> > > > more fragile than perhaps trying to teach src/cgroup.c to >>>> > > > consider it's current cgroup as a starting point. >>>> > > >>>> > > hmm, I don't know why the assumption is bad and how the approach >>>> > > you are suggesting helps the ns problem. >>>> > >>>> > To be clear, the asssumption is that the driver starts in the >>>> > root cgroup, i.e. it's pid is listed in $CGROUP_MOUNTPOINT/tasks. >>>> > And that it can create $CGROUP_MOUNTPOINT/groupname and move >>>> > itself into $CGROUP_MOUNTPOINT/groupname/tasks. >>>> > >>>> > So, the assumption is bad because when the driver does a >>>> > unshare(CLONE_NEWNS), it gets moved into $CGROUP_MOUNTPOINT/X, >>>> > and after that can only move itself into >>>> > $CGROUP_MOUNTPOINT/X/groupname. >>>> > >>>> > Even with your patch, it's possible for the lxc driver to have >>>> > been started under say $CGROUP_MOUNTPOINT/libvir or >>>> > $CGROUP_MOUNTPOINT/<username> through libcgroup/PAM for instance, >>>> > in which case your patch would be insufficient. >>>> >>>> Indeed, we can't assume we're in the root group at any time. Our >>>> general plan is that libvirt itself uses 3 levels of cgroup >>>> starting at the cgroup that libvirtd was placed in by the admin >>>> of the host, then a per-driver group, then per-guest groups >>>> >>>> $LIBVIRTD_ROOT_CGROUP >>>> | >>>> +- lxc >>>> | | >>>> | +- LXC-GUEST-1 >>>> | +- LXC-GUEST-2 >>>> | +- LXC-GUEST-3 >>>> | +- ... >>>> | >>>> +- qemu >>>> | >>>> +- QEMU-GUEST-1 >>>> +- QEMU-GUEST-2 >>>> +- QEMU-GUEST-3 >>>> +- ... >>>> >>>> $LIBVIRTD_ROOT_CGROUP, may be the actaul root mount point for >>>> the cgroup controller in question, or it may be a sub-directory >>>> that the admin chose to put it in. Also, if running libvirtd >>>> as a normal user, the admin may have created per-user account >>>> cgroups and so libvirtd would end up in there. So we have to >>>> be prepared for anything wrt initial libvirtd cgroup root. >>>> >>>> NB The code for putting QEMU in a cgroup isn't merged yet. >>> >>> That sounds good. I just don't think the code currently does >>> it. To do the right thing, IIUC, virCgroupPathOfGroup() should >>> parse the /proc/pid/cgroup contents of the 'controller' process, >>> and insert that between the virCgroupGetMount(controller) >>> result and the group name. >>> >>> Or something... >>> >>> (right?) >>> >>> thanks, >>> -serge >>> >> >> OK I probably understood the problem and wrote a patch for that >> according to the Serge's suggestion. >> >> I expect this patch fixes the problem, however unfortunately >> I've not tried it yet under the situation you are worrying and >> just done regression tests. Because I don't know easy way to >> produce the situation. If you know that, please let me know... > > I found a way to go. lxc-unshare helps me and unveils my second > path is broken... > > ozaki-r > >> >> Thanks, >> ozaki-r > I've updated the patch. The change includes support for multiple mount points of cgroups that I didn't cope with in the previous patch. Through the work, I found a bit messy problem. Current lxc controller writes pid in a 'tasks' file multiple times if one mount point has multiple subsystems. It is bad because the first write changes the cgroups path of a controller, and then the second write points a missing file like $CGROUPS_MOUNTPOINT/<path_to_domain>/<path_to_domain>/tasks where the correct file is $CGROUPS_MOUNTPOINT/<path_to_domain>/tasks. I did workaround this problem with a tricky way by truncating the duplicated path. We probably need a more feasible solution. Note that this patch intends a proof of concept to be clear the correct way to go and thus it remains a couple of awkward codes. Of course the codes should be removed if we find the correct way. Thanks, ozaki-r >From 473183a77fbdb002d55acfed0d8f9bd485f548cd Mon Sep 17 00:00:00 2001 From: Ryota Ozaki <ozaki.ryota@xxxxxxxxx> Date: Mon, 11 May 2009 15:18:47 +0900 Subject: [PATCH] [v2] cgroups: address libvirtd's bad assumption for cgroups hierarchy --- src/cgroup.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 85 insertions(+), 2 deletions(-) diff --git a/src/cgroup.c b/src/cgroup.c index 50517e2..58ba588 100644 --- a/src/cgroup.c +++ b/src/cgroup.c @@ -117,12 +117,76 @@ int virCgroupHaveSupport(void) return 0; } +static int virCgroupGetCurrentPath(int pid, const char *controller, char **curpath) +{ + char buf[(PATH_MAX+256)*9]; /* there are nine subsystems */ + char *path = NULL; + int rc = 0; + int len; + int fd; + char *ctrlpos = NULL; + char *pathpos = NULL; + char *brkpos = NULL; + + if (virAsprintf(&path, "/proc/%d/cgroup", pid) == -1) { + rc = -ENOMEM; + goto error; + } + + fd = open(path, O_RDONLY); + if (fd < 0) { + DEBUG("Unable to open %s: %m", path); + rc = -errno; + goto error_open; + } + + len = saferead(fd, buf, sizeof(buf)); + if (len <= 0) { + rc = -errno; + goto error_saferead; + } + + ctrlpos = strstr(buf, controller); + if (ctrlpos == NULL) { + rc = -1; + goto error_strstr; + } + + pathpos = strstr(ctrlpos, ":/"); + if (pathpos == NULL) { + rc = -1; + goto error_strstr; + } + pathpos += 2; /* skip ":/" */ + + brkpos = strstr(ctrlpos, "\n"); + if (brkpos == NULL) { + rc = -1; + goto error_strstr; + } + *brkpos = '\0'; /* overwrite '\n' */ + + if (virAsprintf(curpath, "%s", pathpos) == -1) { + rc = -ENOMEM; + } + +error_strstr: +error_saferead: + close(fd); +error_open: + VIR_FREE(path); +error: + return rc; +} + static int virCgroupPathOfGroup(const char *group, const char *controller, char **path) { virCgroupPtr root = NULL; + char *curpath = NULL; int rc = 0; + char *pos; root = virCgroupGetMount(controller); if (root == NULL) { @@ -130,8 +194,20 @@ static int virCgroupPathOfGroup(const char *group, goto out; } - if (virAsprintf(path, "%s/%s", root->path, group) == -1) + rc = virCgroupGetCurrentPath(getpid(), controller, &curpath); + if (rc < 0) + goto out; + + /* XXX: remove duplicated path */ + pos = strstr(curpath, group); + if (pos) { + *pos = '\0'; + } + + if (virAsprintf(path, "%s/%s/%s", root->path, curpath, group) == -1) rc = -ENOMEM; + + VIR_FREE(curpath); out: virCgroupFree(&root); @@ -145,6 +221,7 @@ static int virCgroupPathOf(const char *grppath, virCgroupPtr root; int rc = 0; char *controller = NULL; + char *curpath = NULL; if (strchr(key, '.') == NULL) return -EINVAL; @@ -158,8 +235,14 @@ static int virCgroupPathOf(const char *grppath, goto out; } - if (virAsprintf(path, "%s/%s/%s", root->path, grppath, key) == -1) + rc = virCgroupGetCurrentPath(getpid(), controller, &curpath); + if (rc < 0) + goto out; + + if (virAsprintf(path, "%s/%s/%s/%s", root->path, curpath, grppath, key) == -1) rc = -ENOMEM; + + VIR_FREE(curpath); out: virCgroupFree(&root); VIR_FREE(controller); -- 1.6.0.6 -- Libvir-list mailing list Libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list