On 13.04.2016 23:27, Dave Anderson wrote: > > > ----- Original Message ----- >> Initial version of a crash module which can be used to show which cgroups >> is a process member of. >> >> Signed-off-by: Nikolay Borisov <n.borisov.lkml@xxxxxxxxx> >> --- >> >> So here is the second version of the proccgroup module. Changes since v1: >> >> * Now show the full path to the cgroup (limited to 4k long paths). >> * Added support for passing either pid or hex address of task struct, so hat >> cgroup info can be acquired for an arbitrary task >> * Added support for pre-3.15 kernels >> * Removed leftovers from the echo module > > > Hello Nikolay, > > While cgroups have existed since 2.6.24, it appears that cgroup.name > was introduced in 3.10, and cgroup.kn in 3.15. So I have only a > limited set of sample 3.10+ dumpfiles that I could test it on. > > I have many 3.10-based RHEL7 kernels, and the same error occurs on > all of them: > > crash> sys | grep RELEASE > RELEASE: 3.10.0-327.el7.x86_64 > crash> showcg > showcg: invalid kernel virtual address: ff88046666e03060 type: "cgroup_subsys->name" > crash> > > The bad address looks to come from this line: > > readmem(subsys + MEMBER_OFFSET("cgroup_subsys_state", "ss"), KVADDR, &cgroup_subsys_ptr, sizeof(void *), > "cgroup_subsys_state->ss", FAULT_ON_ERROR); > > because the 3.10 kernel does not have a cgroup_subsys_state.ss field, which was > added in 4.2: It was actually added to 3.12 . > > crash> cgroup_subsys_state > struct cgroup_subsys_state { > struct cgroup *cgroup; > atomic_t refcnt; > unsigned long flags; > struct css_id *id; > struct work_struct dput_work; > } > SIZE: 64 > crash> > > Unfortunately you don't have the benefit of being able to use OFFSET(), which > would fail immediately. MEMBER_OFFSET() returns -1 on invalid requests, so you > really have to verify the return value, or add it to your MEMBER_OFFSET() verifications > during your init function. I guess on pre-3.12 kernels I will just skip printing the name of the subsystem. I will take a brief look whether I could recreate the logic in the module rather than relying on traversing structs but I don't consider this high priority. > > And there were these oddities on later kernel versions: > > All 3 of my sample 3.13-based Fedora kernels result in this output: > > crash> sys | grep RELEASE > RELEASE: 3.13.0-0.rc1.git2.1.fc20.x86_64 > crash> showcg > subsys: cpuset cgroup: / > subsys: cpu cgroup: / > subsys: cpuacct cgroup: / > subsys: memory cgroup: / > subsys: devices cgroup: / > subsys: freezer cgroup: / > subsys: net_cls cgroup: / > subsys: blkio cgroup: / > subsys: perf_event cgroup: / > subsys: hugetlb cgroup: / > showcg: invalid kernel virtual address: 0 type: "cgroup_subsys_state->cgroup" > crash> > > I didn't look into why they all end that way. Maybe there's a NULL pointer in the > last entry in the subsys array? I will have to test this on a 3.13 kernel . > > And lastly, I only have one 3.14-based kernel, which shows this: > > crash> sys | grep RELEASE > RELEASE: 3.14.0-rc1+ > crash> showcg > showcg: zero-size memory allocation! (called from 7f3280273719) > crash> > > which would come a cgroup_subsys_arr value of 0 from here > > en_subsys_cnt = MEMBER_SIZE("css_set", "subsys") / sizeof(void *); > cgroup_subsys_arr = (ulong *)GETBUF(en_subsys_cnt * sizeof(ulong)); > > which depends upon CGROUP_SUBSYS_COUNT being something non-zero: > /* > * Set of subsystem states, one for each subsystem. This array is > * immutable after creation apart from the init_css_set during > * subsystem registration (at boot time). > */ > struct cgroup_subsys_state *subsys[CGROUP_SUBSYS_COUNT]; > > And in that kernel apparently CONFIG_GROUPS was not configured and > therefore CGROUP_SUBSYS_COUNT is 0: But there is already logic in the initialization routine which should handle cases where CONFIG_CGROUP is not selected, simply by checking whether the "cgroups" member in task_struct exists. I checked on LXR and this member has always been protected by #ifdef CONFIG_CGROUPS. Maybe this is fedora kernel specific? Can you please take a look in the definition of task_struct whether the 'cgroups' member is protected by an ifdef guard? I can easily augment the check to consider the size of subsys array. I tested the code on 3.12 and on !CONFIG_CGROUPS the extension correctly bails out. > > #else /* CONFIG_CGROUPS */ > > #define CGROUP_SUBSYS_COUNT 0 > > static inline void cgroup_threadgroup_change_begin(struct task_struct *tsk) {} > static inline void cgroup_threadgroup_change_end(struct task_struct *tsk) {} > > #endif /* CONFIG_CGROUPS */ > > making it an empty structure: > > crash> css_set > struct css_set { > atomic_t refcount; > struct hlist_node hlist; > struct list_head tasks; > struct list_head cgrp_links; > struct cgroup_subsys_state *subsys[]; > struct callback_head callback_head; > } > SIZE: 72 > crash> css_set -o > struct css_set { > [0] atomic_t refcount; > [8] struct hlist_node hlist; > [24] struct list_head tasks; > [40] struct list_head cgrp_links; > [56] struct cgroup_subsys_state *subsys[]; > [56] struct callback_head callback_head; > } > SIZE: 72 > crash> > > The other 3.18 and 4.x based kernels ran the command OK. > > Another thing I might suggest if your idea is to assist in the > actual debugging of cgroup problems -- would be to print the > address of key data structures as part of the command's output. > That kind of thing is done by most crash commands, so that a user > can quickly dump, for example, the target cgroup structure, or > perhaps some of the other structures that would be helpful to > fully display. > > On the other hand, maybe all you're interested in seeing is the > cgroup name and path? I don't know -- that's up to you. For now my intention is to have a quick way to know which cgroup is a process member of. If someone can provide usecase as to which addresses might be usefull I will consider adding those. > > Also, you don't have to post your module as a patch to the > extensions subdirectory. I'm not going to add the file to the > crash sources contained in the tar.gz or src.rpm releases, but > rather I will post your module source file, and directions on > how to build it, on the extensions web page accessible from > http://people.redhat.com/anderson/extensions.html. So you can > just attach the module's C file to your email to this mailing list. Ok, will have this in mind in my next posting. Thanks a lot for the detailed and helpful feedback! > > Thanks, > Dave > > > > >> >> extensions/proccgroup.c | 278 >> ++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 278 insertions(+) >> create mode 100644 extensions/proccgroup.c >> >> diff --git a/extensions/proccgroup.c b/extensions/proccgroup.c >> new file mode 100644 >> index 0000000..aee735b >> --- /dev/null >> +++ b/extensions/proccgroup.c >> @@ -0,0 +1,278 @@ >> +/* >> + * This program is distributed in the hope that it will be useful, >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> + * GNU General Public License for more details. >> + * >> + * Nikolay Borisov <n.borisov.lkml@xxxxxxxxx> >> + */ >> + >> +#include <stdbool.h> >> +#include "defs.h" >> + >> +#define MAX_CGROUP_PATH 4096 >> + >> +static void showcgrp(void); >> +char *help_proc_cgroups[]; >> + >> +static struct command_table_entry command_table[] = { >> + { "showcg", showcgrp, help_proc_cgroups, 0}, >> + { NULL }, >> +}; >> + >> + >> +void __attribute__((constructor)) >> +proccgroup_init(void) >> +{ >> + >> + if (!MEMBER_EXISTS("task_struct", "cgroups") || >> + (!MEMBER_EXISTS("cgroup", "kn") && !MEMBER_EXISTS("cgroup", >> "name"))) >> + { >> + fprintf(fp, "Unrecognised or disabled cgroup support\n"); >> + return; >> + } >> + >> + register_extension(command_table); >> +} >> + >> +void __attribute__((destructor)) >> +proccgroup_finish(void) { } >> + >> +/* Prepends contents of cgroup_name to buf, using start as a pointer >> + * index into buf >> + */ >> +static void prepend_string(char *buf, char **start, char *cgroup_name) { >> + >> + int len = strlen(cgroup_name); >> + *start -= len; >> + >> + if (*start < buf) { >> + error(FATAL, "Cgroup too long to parse\n"); >> + } >> + >> + memcpy(*start, cgroup_name, len); >> + >> + if (--*start < buf) { >> + error(FATAL, "Cgroup too long to parse\n"); >> + } >> + >> + **start = '/'; >> +} >> + >> +/* For post-3.15 kernels */ >> +static void get_cgroup_name_kn(ulong cgroup, char *buf, int buflen) >> +{ >> + ulong kernfs_node; >> + ulong cgroup_name_ptr; >> + ulong kernfs_parent; >> + bool slash_prepended = false; >> + char cgroup_name[BUFSIZE]; >> + char *start = buf + buflen - 1; >> + *start = '\0'; //null terminate the end >> + >> + /* Get cgroup->kn */ >> + readmem(cgroup + MEMBER_OFFSET("cgroup", "kn"), KVADDR, &kernfs_node, >> sizeof(void *), >> + "cgroup->kn", FAULT_ON_ERROR); >> + >> + do { >> + /* Get kn->name */ >> + readmem(kernfs_node + MEMBER_OFFSET("kernfs_node", "name"), KVADDR, >> &cgroup_name_ptr, sizeof(void *), >> + "kernfs_node->name", FAULT_ON_ERROR); >> + /* Get kn->parent */ >> + readmem(kernfs_node + MEMBER_OFFSET("kernfs_node", "parent"), >> KVADDR, &kernfs_parent, sizeof(void *), >> + "kernfs_node->parent", FAULT_ON_ERROR); >> + >> + if (kernfs_parent != 0) { >> + read_string(cgroup_name_ptr, cgroup_name, BUFSIZE-1); >> + prepend_string(buf, &start, cgroup_name); >> + slash_prepended = true; >> + } else if (!slash_prepended) { >> + if (--start < buf) { >> + error(FATAL, "Cgroup too long to parse\n"); >> + } >> + *start = '/'; >> + } >> + >> + kernfs_node = kernfs_parent; >> + >> + } while(kernfs_parent); >> + >> + memmove(buf, start, buf + buflen - start); >> +} >> + >> +/* For pre-3.15 kernels */ >> +static void get_cgroup_name_old(ulong cgroup, char *buf, size_t buflen) >> +{ >> + ulong cgroup_name_ptr; >> + ulong cgroup_parent_ptr; >> + char cgroup_name[BUFSIZE]; >> + char *start = buf + buflen - 1; >> + *start = '\0'; //null terminate the end >> + bool slash_prepended = false; >> + >> + do { >> + /* Get cgroup->name */ >> + readmem(cgroup + MEMBER_OFFSET("cgroup", "name"), KVADDR, >> &cgroup_name_ptr, sizeof(void *), >> + "cgroup->name", FAULT_ON_ERROR); >> + /* Get cgroup->parent */ >> + readmem(cgroup + MEMBER_OFFSET("cgroup", "parent"), KVADDR, >> &cgroup_parent_ptr, sizeof(void *), >> + "cgroup->parent", FAULT_ON_ERROR); >> + >> + read_string(cgroup_name_ptr + MEMBER_OFFSET("cgroup_name", "name"), >> cgroup_name, BUFSIZE-1); >> + >> + if (cgroup_parent_ptr) { >> + prepend_string(buf, &start, cgroup_name); >> + slash_prepended = true; >> + } else if (!slash_prepended) { >> + if (--start < buf) >> + break; >> + *start = '/'; >> + } >> + >> + cgroup = cgroup_parent_ptr; >> + >> + } while(cgroup_parent_ptr); >> + >> + memmove(buf, start, buf + buflen - start); >> +} >> + >> +static void get_subsys_name(ulong subsys, char *buf, size_t buflen) >> +{ >> + ulong subsys_name_ptr; >> + ulong cgroup_subsys_ptr; >> + >> + /* Get cgroup->kn */ >> + readmem(subsys + MEMBER_OFFSET("cgroup_subsys_state", "ss"), KVADDR, >> &cgroup_subsys_ptr, sizeof(void *), >> + "cgroup_subsys_state->ss", FAULT_ON_ERROR); >> + >> + readmem(cgroup_subsys_ptr + MEMBER_OFFSET("cgroup_subsys", "name"), >> KVADDR, &subsys_name_ptr, sizeof(void *), >> + "cgroup_subsys->name", FAULT_ON_ERROR); >> + read_string(subsys_name_ptr, buf, buflen-1); >> +} >> + >> +static void get_cgroup_name(ulong cgroup, ulong subsys) >> +{ >> + char *cgroup_path = GETBUF(MAX_CGROUP_PATH); >> + char subsys_name[BUFSIZE]; >> + >> + /* Handle the 2 cases of cgroup_name and the kernfs one */ >> + if (MEMBER_EXISTS("cgroup", "kn")) { >> + get_cgroup_name_kn(cgroup, cgroup_path, MAX_CGROUP_PATH); >> + } else if (MEMBER_EXISTS("cgroup", "name")) { >> + get_cgroup_name_old(cgroup, cgroup_path, MAX_CGROUP_PATH); >> + } >> + >> + get_subsys_name(subsys, subsys_name, BUFSIZE); >> + >> + fprintf(fp, "subsys: %-20s cgroup: %s\n", subsys_name, cgroup_path); >> + >> + FREEBUF(cgroup_path); >> +} >> + >> + >> +void show_proc_cgroups(ulong task_ctx) { >> + int en_subsys_cnt; >> + int i; >> + ulong *cgroup_subsys_arr; >> + ulong subsys_base_ptr; >> + ulong cgroups_subsys_ptr = 0; >> + >> + >> + /* Get address of task_struct->cgroups */ >> + readmem(task_ctx + MEMBER_OFFSET("task_struct", "cgroups"), >> + KVADDR, &cgroups_subsys_ptr, sizeof(void *), >> + "task_struct->cgroups", FAULT_ON_ERROR); >> + >> + subsys_base_ptr = cgroups_subsys_ptr + MEMBER_OFFSET("css_set", >> "subsys"); >> + en_subsys_cnt = MEMBER_SIZE("css_set", "subsys") / sizeof(void *); >> + cgroup_subsys_arr = (ulong *)GETBUF(en_subsys_cnt * sizeof(ulong)); >> + >> + /* Get the contents of the css_set->subsys array */ >> + readmem(subsys_base_ptr, KVADDR, cgroup_subsys_arr, sizeof(ulong) * >> en_subsys_cnt, >> + "css_set->subsys", FAULT_ON_ERROR); >> + >> + for (i = 0; i < en_subsys_cnt; i++) { >> + ulong cgroup; >> + >> + /* Get cgroup_subsys_state -> cgroup */ >> + readmem(cgroup_subsys_arr[i] + MEMBER_OFFSET("cgroup_subsys_state", >> "cgroup"), >> + KVADDR, &cgroup, sizeof(void *), >> "cgroup_subsys_state->cgroup", FAULT_ON_ERROR); >> + >> + get_cgroup_name(cgroup, cgroup_subsys_arr[i]); >> + } >> + >> + FREEBUF(cgroup_subsys_arr); >> +} >> + >> + >> +static void showcgrp(void) { >> + >> + ulong value; >> + struct task_context *tc; >> + ulong task_struct_ptr = 0; >> + >> + while (args[++optind]) { >> + if (IS_A_NUMBER(args[optind])) { >> + switch (str_to_context(args[optind], &value, &tc)) >> + { >> + case STR_PID: >> + task_struct_ptr = tc->task; >> + ++optind; >> + break; >> + >> + case STR_TASK: >> + task_struct_ptr = value; >> + ++optind; >> + break; >> + >> + case STR_INVALID: >> + error(FATAL, "invalid task or pid value: %s\n\n", >> + args[optind]); >> + break; >> + } >> + } else { >> + if (argcnt > 1) >> + error(FATAL, "invalid task or pid value: >> %s\n",args[optind]); >> + else >> + break; >> + } >> + } >> + >> + if (!task_struct_ptr) { >> + task_struct_ptr = CURRENT_TASK(); >> + } >> + >> + show_proc_cgroups(task_struct_ptr); >> +} >> + >> +char *help_proc_cgroups[] = { >> + "showcg", >> + "Show which cgroups is a process member of", >> + " [task | pid]", >> + >> + " This command prints the cgroup for each subsys that a process is a >> member of", >> + "\nExample", >> + " Show the cgroup for the currently active process:\n", >> + " crash> showcg", >> + " subsys: cpuset cgroup: >> /user.slice/user-1000.slice/session-c1.scope", >> + " subsys: cpu cgroup: >> /user.slice/user-1000.slice/session-c1.scope", >> + " subsys: cpuacct cgroup: >> /user.slice/user-1000.slice/session-c1.scope", >> + " subsys: blkio cgroup: >> /user.slice/user-1000.slice/session-c1.scope", >> + " subsys: memory cgroup: >> /user.slice/user-1000.slice/session-c1.scope", >> + " subsys: devices cgroup: >> /user.slice/user-1000.slice/session-c1.scope", >> + " subsys: freezer cgroup: >> /user.slice/user-1000.slice/session-c1.scope", >> + " subsys: net_cls cgroup: >> /user.slice/user-1000.slice/session-c1.scope", >> + " subsys: perf_event cgroup: >> /user.slice/user-1000.slice/session-c1.scope", >> + " subsys: net_prio cgroup: >> /user.slice/user-1000.slice/session-c1.scope", >> + " subsys: hugetlb cgroup: >> /user.slice/user-1000.slice/session-c1.scope", >> + "\n Alternatively you can pass either a pid or a task pointer to >> show the cgroup the", >> + " respective process is a member of e.g:\n", >> + " crash> showcg 1064\n OR", >> + " crash> showcg ffff880405711b80", >> + >> + >> + >> + NULL >> +}; >> + >> + >> -- >> 2.5.0 >> >> -- >> Crash-utility mailing list >> Crash-utility@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/crash-utility >> -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility