On Wed, Jan 23 2013, Glauber Costa wrote: > In order to maintain all the memcg bookkeeping, we need per-node > descriptors, which will in turn contain a per-zone descriptor. > > Because we want to statically allocate those, this array ends up being > very big. Part of the reason is that we allocate something large enough > to hold MAX_NUMNODES, the compile time constant that holds the maximum > number of nodes we would ever consider. > > However, we can do better in some cases if the firmware help us. This is > true for modern x86 machines; coincidentally one of the architectures in > which MAX_NUMNODES tends to be very big. > > By using the firmware-provided maximum number of nodes instead of > MAX_NUMNODES, we can reduce the memory footprint of struct memcg > considerably. In the extreme case in which we have only one node, this > reduces the size of the structure from ~ 64k to ~2k. This is > particularly important because it means that we will no longer resort to > the vmalloc area for the struct memcg on defconfigs. We also have enough > room for an extra node and still be outside vmalloc. > > One also has to keep in mind that with the industry's ability to fit > more processors in a die as fast as the FED prints money, a nodes = 2 > configuration is already respectably big. > > [ v2: use size_t for size calculations ] > Signed-off-by: Glauber Costa <glommer@xxxxxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxx> > Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> > Cc: Johannes Weiner <hannes@xxxxxxxxxxx> > Cc: Greg Thelen <gthelen@xxxxxxxxxx> > Cc: Hugh Dickins <hughd@xxxxxxxxxx> > Cc: Ying Han <yinghan@xxxxxxxxxx> > Cc: Mel Gorman <mgorman@xxxxxxx> > Cc: Rik van Riel <riel@xxxxxxxxxx> Reviewed-by: Greg Thelen <gthelen@xxxxxxxxxx> > --- > mm/memcontrol.c | 40 +++++++++++++++++++++++++--------------- > 1 file changed, 25 insertions(+), 15 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 09255ec..09d8b02 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -172,7 +172,7 @@ struct mem_cgroup_per_node { > }; > > struct mem_cgroup_lru_info { > - struct mem_cgroup_per_node *nodeinfo[MAX_NUMNODES]; > + struct mem_cgroup_per_node *nodeinfo[0]; It seems like a VM_BUG_ON() in mem_cgroup_zoneinfo() asserting that the nid index is less than nr_node_ids would be good at catching illegal indexes. I don't see any illegal indexes in your patch, but I fear that someday a MAX_NUMNODES based for() loop might sneak in. > }; > > /* > @@ -276,17 +276,6 @@ struct mem_cgroup { > */ > struct res_counter kmem; > /* > - * Per cgroup active and inactive list, similar to the > - * per zone LRU lists. > - */ > - struct mem_cgroup_lru_info info; > - int last_scanned_node; > -#if MAX_NUMNODES > 1 > - nodemask_t scan_nodes; > - atomic_t numainfo_events; > - atomic_t numainfo_updating; > -#endif > - /* > * Should the accounting and control be hierarchical, per subtree? > */ > bool use_hierarchy; > @@ -349,8 +338,29 @@ struct mem_cgroup { > /* Index in the kmem_cache->memcg_params->memcg_caches array */ > int kmemcg_id; > #endif > + > + int last_scanned_node; > +#if MAX_NUMNODES > 1 > + nodemask_t scan_nodes; > + atomic_t numainfo_events; > + atomic_t numainfo_updating; > +#endif > + /* > + * Per cgroup active and inactive list, similar to the > + * per zone LRU lists. > + * > + * WARNING: This has to be the last element of the struct. Don't > + * add new fields after this point. > + */ > + struct mem_cgroup_lru_info info; > }; > > +static inline size_t memcg_size(void) > +{ > + return sizeof(struct mem_cgroup) + > + nr_node_ids * sizeof(struct mem_cgroup_per_node); > +} > + Tangential question: why use inline here? I figure that modern compilers are good at making inlining decisions. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>