Re: [PATCH] Add the proccgroup extension

Nikolay Borisov <n.borisov.lkml@xxxxxxxxx> · Mon, 11 Apr 2016 17:22:27 +0300

On 04/11/2016 05:06 PM, Dave Anderson wrote:
> 
> 
> ----- Original Message -----
>> Initial version of a crash module which can be used to show which cgroups
>> is the currently active process member of.
>> ---
>> Hello this is a simple crash extension that I hacked up over the weekend, in my
>> case when I look at kernel crash dump I want to quickly understand which cgroup
>> is the current process member of. Currently it uses the process from the current
>> context but this might change in the future. Here is an example output:
>>
>> crash> show_cgroups
>> subsys: cpuset               cgroup: c6666
>> subsys: cpu                  cgroup: c6666
>> subsys: cpuacct              cgroup: c6666
>> subsys: io                   cgroup: c6666
>> subsys: memory               cgroup: c6666
>> subsys: devices              cgroup: c6666
>> subsys: freezer              cgroup: c6666
>> subsys: perf_event           cgroup: c6666
>> subsys: pids                 cgroup: c6666
>>
>> I have tested this on 4.6-rc2 with and without cgroup support enabled.
>>
>> I'm just sending this to get an initial idea whether I have used crash's
>> facilities correctly and canvas for future ideas. I'm aware there are already 2
>> cgroup modules but when I tried running either they complained of no cgroup 
>> support or the command did nothing. In any case provided that the code is ok I
>> guess this can be used as a good example of how to traverse structures with crash
>>
>> TODO:
>>  * Make the command understand either task_struct pointer or pid being passed.
>>  * Add support for pre-3.15 kernels (the cgroup name struct changed to kernfs at that point)
>>  * Whatever people think might be useful
> 
> 
> Hello Nikolay,
> 
> I have a few comments and suggestions:
> 
> +#include "defs.h"
> +
> +#define CGROUP_PATH_MAX
> 
> What is the purpose of CGROUP_PATH_MAX?  

I was thinking of using this define to limit the size of various
buffers, but I saw that for most cases BUFSIZE works. This will likely
be removed.

> 
> +void proccgroup_init(void);
> +void proccgroup_fini(void);
> +
> ... [ cut ] ...
> +
> +void __attribute__((constructor))
> +echo_init(void)
> +{
> +        register_extension(command_table);
> +}
> +
> +void __attribute__((destructor))
> +echo_fini(void) { }
> +
> 
> You've got forward declarations of proccgroup_init() and proccgroup_fini(), 
> but the two functions don't exist.  The "echo_xxx()" functions in the 
> echo.c sample module are meant to be replaced by names reflecting your
> module. 

I will rename the echo* function, I guess I forgot doing that. Actually
are forward declarations for the init/destroy functions really needed? I
guess not.

> 
> If I run this module on a 3.10 kernel, I get this:
>   
> crash> extend proccgroup.so
> ./extensions/proccgroup.so: shared object loaded
> crash> show_cgroups
> Unsupported kernel versionUnsupported kernel versionUnsupported kernel versionUnsupported kernel versionUnsupported kernel versionUnsupported kernel versionUnsupported kernel versionUnsupported kernel versionUnsupported kernel versionUnsupported kernel versionshow_cgroups: invalid kernel virtual address: 0  type: "cgroup_subsys_state->cgroup"
>   Unsupported kernel version
> crash>
> 
> because of this:
>   
> +        /* Handle the 2 cases of cgroup_name and the kernfs one */
> +        if (MEMBER_EXISTS("cgroup", "kn")) {
> +            get_kn_cgroup_name(cgroup, subsys_ptr);
> +        } else if (MEMBER_EXISTS("cgroup", "name")) {
> +            fprintf(fp, "Unsupported kernel version");
> +        }
> 
> You should replace the "fprintf(fp, "Unsupported kernel version")" error message
> above with an "error(FATAL, ...)" construct (and add a "\n" to the end), so that
> the command will fail immediately and return to the "crash> " prompt.

I've since added supported for pre-3.15 kernels. So this will likely
work, but I did see the output you showed. Clearly the failure case
wasn't handled properly.

> 
> Alternatively, you can have your module purposely fail to load if any required
> structure members do not exist.  Typically modules will have their xxxx_init() 
> constructor function do some pre-checks of the kernel being analyzed, and if the
> command will not be able to run correctly, return before calling register_extension():
>   
>   void __attribute__((constructor))
>   xxxx_init(void)
>   {
>    	if (!MEMBER_EXISTS(...) || !MEMBER_EXISTS(...)) {
>   		fprintf(fp, "some error message\n");
>   		return;
>   	}
>   
> 	register_extension(command_table);
>   }
> 
> I note that all of your readmem() calls use RETURN_ON_ERROR, such as these examples:
> 
> +static void get_cgroup_name(ulong cgroup, char *buf, size_t buflen) 
> +{
> +
> +    ulong kernfs_node;
> +    ulong cgroup_name_ptr;
> +    ulong kernfs_parent;
> +
> +    /* Get cgroup->kn */
> +    readmem(cgroup + MEMBER_OFFSET("cgroup", "kn"), KVADDR, &kernfs_node, sizeof(void *),
> +            "cgroup->kn", RETURN_ON_ERROR);
> +
> +    readmem(kernfs_node + MEMBER_OFFSET("kernfs_node", "parent"), KVADDR, &kernfs_parent, sizeof(void *),
> +            "kernfs_node->parent", RETURN_ON_ERROR);
> +
> +    if (kernfs_parent == 0) {
> +        sprintf(buf, "/");
> +        return;
> +    }
> +
> +    /* Get kn->name */
> +    readmem(kernfs_node + MEMBER_OFFSET("kernfs_node", "name"), KVADDR, &cgroup_name_ptr, sizeof(void *),
> +            "kernfs_node->name", RETURN_ON_ERROR);
> +
> +    read_string(cgroup_name_ptr, buf, buflen-1);
> +}
> 
> Unless you can do something useful in the case of a readmem() error, it's best to
> use FAULT_ON_ERROR.  You will get an error message automatically generated by readmem(),
> but similar to "error(FATAL, ...)" it will return immediately to the "crash> " prompt.
> In your module above, you would just get a stream of confusing error messages.

I thought return_on_error essentially works with fault_on_error
semantics. Clearly I've misunderstood that so will fix it.

>   
> You might consider changing the name of the command to something that
> flows off the fingertips a bit easier:
> 
> +static struct command_table_entry command_table[] = {
> +        { "show_cgroups", show_proc_cgroups, help_proc_cgroups, 0},
> +        { NULL },
> +};
> 
> Maybe something like "cgrp", or "showcg", or anything short but to
> the point.

Yes, I'm not very happy with the current name so will likely change that.

Thanks for taking the time to review it.

>  
> Thanks,
>   Dave

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility