On Wed, 22 Jan 2014, Michal Hocko wrote: > On Tue 21-01-14 13:18:42, Hugh Dickins wrote: > [...] > > We do have a confusing situation. The hang goes back to 3.10 but takes > > two different forms, because of intervening changes: in 3.10 and 3.11 > > mem_cgroup_iter repeatedly returns root memcg to its caller, in 3.12 and > > 3.13 mem_cgroup_iter repeatedly gets NULL memcg from mem_cgroup_iter_next > > and cannot return to its caller. > > > > Patch 1/2 is what's needed to fix 3.10 and 3.11 (and applies correctly > > to 3.11, but will have to be rediffed for 3.10 because of rearrangement > > in between). > > I will backport it when it reaches stable queue. Thank you. Testing, at home and at work, has confirmed your two patches are good. Greg's particular test on 3.11 gave convincing results, and I was helped by Linus's tree of last night (df32e43a54d0) happening to be quite quick to reproduce the issue on my laptop. > > > Patch 2/2 is what's needed to fix 3.12 and 3.13 (but applies > > correctly to neither of them because it's diffed on top of my CSS_ONLINE > > fix). Patch 1/2 is correct but unnecessary in 3.12 and 3.13: I'm unclear > > whether Michal is claiming that it would also fix the hang in 3.12 and > > 3.13 if we didn't have 2/2: I doubt that, and haven't tested that. > > Actually both patches are needed. If we had only 2/2 then we wouldn't > endless loop inside mem_cgroup_iter but we could still return root to > caller all the time because mem_cgroup_iter_load would return NULL on > css_tryget failure on the cached root. Or am I missing something that > would prevent that? In theory I agree with you; and if you're missing something, I can't see it either. But in practice, all my earlier testing of 3.12 and 3.13 was just with 2/2, and I've tried without your 1/2 since: whereas I have hung on 3.12 and 3.13 a convincing number of times without 2/2, I have never hung on them with 2/2 without 1/2. In practice 1/2 appears essential for 3.10 and 3.11, but irrelevant for 3.12 and 3.13. That could be easy to explain if there were a difference at the calling end, shrink_zone(), between those releases: but I don't see that. Odd. Either we're both missing something, or my testing is even less reliable than I'd thought. But since I certainly don't dispute 1/2, it is merely academic. Though still bothersome. > > > Given how Michal has diffed this patch on top of my CSS_ONLINE one > > (mm-memcg-iteration-skip-memcgs-not-yet-fully-initialized.patch), > > it would be helpful if you could mark that one also for stable 3.12+, > > to save us from having to rediff this one for stable. We don't have > > a concrete example of a problem it solves in the vanilla kernel, but > > it makes more sense to include it than to exclude it. > > Yes, I think it makes sense to queue it for 3.12+ as well because it is > non intrusive and potential issues would be really subtle. Before Andrew sends these all off to Linus, I should admit that there's probably a refinement still to come to the CSS_ONLINE one. I'm ashamed to admit that I overlooked a much earlier comment from Greg Thelen, who suggested that a memory barrier might be required. I think he's right, and I'd have liked to say exactly what and where before answering you now; but barriers are tricky elusive things, I've not yet decided, and better report back to you now on the testing result. Hugh > > > (You would be right to point out that the CSS_ONLINE one fixes > > something that goes back to 3.10: I'm saying 3.12+ because I'm not > > motivated to rediff it for 3.10 and 3.11 when there's nothing to > > go on top; but that's not a very good reason to lie - overrule me.) > > > > Hugh > > -- > Michal Hocko > SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>