On Thu, 2 Jul 2009 18:30:04 -0700 Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > On Thu, 2 Jul 2009 18:08:29 -0700 Paul Menage <menage@xxxxxxxxxx> wrote: > > > On Thu, Jul 2, 2009 at 5:53 PM, Andrew Morton<akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > >> In the first snippet, count will be at most equal to length. As length > > >> is determined from cgroup_task_count, it can be no greater than the > > >> total number of pids on the system. > > > > > > Well that's a problem, because there can be tens or hundreds of > > > thousands of pids, and there's a fairly low maximum size for kmalloc()s > > > (include/linux/kmalloc_sizes.h). > > > > > > And even if this allocation attempt doesn't exceed KMALLOC_MAX_SIZE, > > > large allocations are less unreliable. __There is a large break point at > > > 8*PAGE_SIZE (PAGE_ALLOC_COSTLY_ORDER). > > > > This has been a long-standing problem with the tasks file, ever since > > the cpusets days. > > > > There are ways around it - Lai Jiangshan <laijs@xxxxxxxxxxxxxx> posted > > a patch that allocated an array of pages to store pids in, with a > > custom sorting function that let you specify indirection rather than > > assuming everything was in one contiguous array. This was technically > > the right approach in terms of not needing vmalloc and never doing > > large allocations, but it was very complex; an alternative that was > > mooted was to use kmalloc for small cgroups and vmalloc for large > > ones, so the vmalloc penalty wouldn't be paid generally. The thread > > fizzled AFAICS. > > It's a problem which occurs fairly regularly. Some sites are fairly > busted. Many gave up and used vmalloc(). Others use an open-coded > array-of-pages thing. > > This happens enough that I expect the kernel would benefit from a > general dynamic-array library facility. Something whose interface > mimics the C-level array operations but which is internally implemented > via some data structure which uses PAGE_SIZE allocations. Probably a > simple two-level thing would suffice. > I think both of kmalloc usage here are very bad. Why we can't do what readdir(/proc) does ? I'm sorry I misunderstand. Following is an easy example. 0. at open, inilialize f_pos to 0. f_pos is used as "pid" remember "css_set with hole" as template in f_private?(or somewhere) at open ...like this. -- struct cgroupfs_root *root = cgrp->root; struct cgroup *template = kzalloc(sizeof(void*) * CGROUP_SUBSYS_COUNT); for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) if (root->subsys_bits & (1UL << i)) template[i] = cgrp->subsys[i]; -- 1. at read(), find task_struct of "pid" in f_pos. 2. look up task_struct of "pid" and compare with f_private -- struct cgroup *template = f_private; for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) { if (!template[i]) contiue; if (template[i] != task_subsys_state(task, i)) break; } if (i == CGROUP_SUBSYS_COUNT) print task; -- 4. f_pos++ until filling seq_buffer. Thanks, -Kame _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers