Sorry all for taking a lot of time to reply to this. I've been really busy. On 03/19/2013 05:58 PM, Michal Hocko wrote: > On Tue 05-03-13 17:10:56, Glauber Costa wrote: >> It is an accepted fact that memcg sucks. But can it suck faster? Or in >> a more fair statement, can it at least stop draining everyone's >> performance when it is not in use? >> >> This experimental and slightly crude patch demonstrates that we can do >> that by using static branches to patch it out until the first memcg >> comes to life. There are edges to be trimmed, and I appreciate comments >> for direction. In particular, the events in the root are not fired, but >> I believe this can be done without further problems by calling a >> specialized event check from mem_cgroup_newpage_charge(). >> >> My goal was to have enough numbers to demonstrate the performance gain >> that can come from it. I tested it in a 24-way 2-socket Intel box, 24 Gb >> mem. I used Mel Gorman's pft test, that he used to demonstrate this >> problem back in the Kernel Summit. There are three kernels: >> >> nomemcg : memcg compile disabled. >> base : memcg enabled, patch not applied. >> bypassed : memcg enabled, with patch applied. >> >> base bypassed >> User 109.12 105.64 >> System 1646.84 1597.98 >> Elapsed 229.56 215.76 >> >> nomemcg bypassed >> User 104.35 105.64 >> System 1578.19 1597.98 >> Elapsed 212.33 215.76 > > Do you have profiles for where we spend the time? > I don't *have* in the sense that I never saved them, but it is easy to grab. I've just run Mel's pft test with perf top -a in parallel, and that was mostly the charge and uncharge functions being run. >> #ifdef CONFIG_MEMCG >> +extern struct static_key memcg_in_use_key; >> + >> +static inline bool mem_cgroup_subsys_disabled(void) >> +{ >> + return !!mem_cgroup_subsys.disabled; >> +} >> + >> +static inline bool mem_cgroup_disabled(void) >> +{ >> + /* >> + * Will always be false if subsys is disabled, because we have no one >> + * to bump it up. So the test suffices and we don't have to test the >> + * subsystem as well >> + */ > > but static_key_false adds an atomic read here which is more costly so I > am not sure you are optimizing much. > No it doesn't. You're missing the point of static branches: The code is *patched out* until it is not used. So it adds a predictable deterministic jump instruction to the false statement, and that's it (hence their previous name 'jump label'). >> + >> +extern int __mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm, >> + gfp_t gfp_mask); >> +static inline int >> +mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) >> +{ >> + if (mem_cgroup_disabled()) >> + return 0; >> + >> + return __mem_cgroup_cache_charge(page, mm, gfp_mask); >> +} > > Are there any reasons to not get down to __mem_cgroup_try_charge? We > will not be perfect, all right, because some wrappers already do some > work but we should at least cover most of them. > > I am also thinking whether this stab at charging path is not just an > overkill. Wouldn't it suffice to do something like: I don't know. I could test. I just see no reason for that. Being able to patch out code in the caller level means we'll not incur even a function call. That's a generally accepted good thing to do in hot paths. Specially given the fact that the memcg overhead seems not to be concentrated in one single place, but as Christoph Lameter defined, "death by a thousand cuts", I'd much rather not even pay the function calls if I can avoid. If I introducing great complexity for that, fine, I could trade off. But honestly, the patch gets bigger but that's it. >> struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *); >> struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *); > [...] >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index bfbf1c2..45c1886 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c > [...] >> @@ -1335,6 +1345,20 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct zone *zone) >> memcg = pc->mem_cgroup; > > I would expect that you want to prevent lookup as well if there are no > other groups. > well, that function in specific seems to be mostly called during reclaim, where I wasn't terribly concerned about optimizations, unlike the steady state functions. >> /* >> + * Because we lazily enable memcg only after first child group is >> + * created, we can have memcg == 0. Because page cgroup is created with >> + * GFP_ZERO, and after charging, all page cgroups will have a non-zero >> + * cgroup attached (even if root), we can be sure that this is a >> + * used-but-not-accounted page. (due to lazyness). We could get around >> + * that by scanning all pages on cgroup init is too expensive. We can >> + * ultimately pay, but prefer to just to defer the update until we get >> + * here. We could take the opportunity to set PageCgroupUsed, but it >> + * won't be that important for the root cgroup. >> + */ >> + if (!memcg && PageLRU(page)) >> + pc->mem_cgroup = memcg = root_mem_cgroup; > > Why not return page_cgroup_zoneinfo(root_mem_cgroup, page); > This would require messing up with __mem_cgroup_uncharge_common but that > doesn't sound incredibly crazy (to the local standard of course ;)). > Could you clarify? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>