On Thu, 23 Jan 2014, Michal Hocko wrote: > On Thu 23-01-14 02:42:58, Hugh Dickins wrote: > > > > > > Actually both patches are needed. If we had only 2/2 then we wouldn't > > > endless loop inside mem_cgroup_iter but we could still return root to > > > caller all the time because mem_cgroup_iter_load would return NULL on > > > css_tryget failure on the cached root. Or am I missing something that > > > would prevent that? > > > > In theory I agree with you; and if you're missing something, I can't see > > it either. But in practice, all my earlier testing of 3.12 and 3.13 was > > just with 2/2, and I've tried without your 1/2 since: whereas I have hung > > on 3.12 and 3.13 a convincing number of times without 2/2, I have never > > hung on them with 2/2 without 1/2. In practice 1/2 appears essential > > for 3.10 and 3.11, but irrelevant for 3.12 and 3.13. > > > > That could be easy to explain if there were a difference at the calling > > end, shrink_zone(), between those releases: but I don't see that. Odd. > > Either we're both missing something, or my testing is even less reliable > > than I'd thought. But since I certainly don't dispute 1/2, it is merely > > academic. Though still bothersome. > > I would assume that it is (sc->nr_reclaimed >= sc->nr_to_reclaim) that > helps us to back off. SWAP_CLUSTER_MAX shouldn't be that hard to get to > before css_offline racing part reparents all the memory. But wouldn't explain why I could see it on 3.10,11 but not on 3.12,13. Perhaps the 2/2 problem is a lot easier to hit than the 1/2 problem, and I mistakenly expected to see the 1/2 problem in the timescale that I saw the 2/2 problem; but I don't really think either is the case. > > Anyway, I would feel safer if this was pushed fixed although you haven't > reporoduced it. Absolutely. > > Before Andrew sends these all off to Linus, I should admit that there's > > probably a refinement still to come to the CSS_ONLINE one. I'm ashamed > > to admit that I overlooked a much earlier comment from Greg Thelen, who > > suggested that a memory barrier might be required. > > I was thinking about mem barrier while reviewing your patch but then I > convinced myself that we should be safe also without using one when > checking CSS_ONLINE. > We have basically two situations. > - online_css when we can miss it being set which is OK because > we would miss a new empty group. > - offline_css when we could still see the flag being set but > then css_tryget would be already failing. > > So while all this is subtle and relies on cgroup core heavily I think we > should be safe wrt. memory barriers. > > Or did you mean something else here? Something else. My CSS_OFFLINE patch claims to prevent the iterator from returning an uninitialized struct mem_cgroup: if that is to be relied upon, then it needs to make sure that the initialization of the mem_cgroup is visible to the caller before the CSS_OFFLINE flag. kernel/cgroup.c online_css() does nowadays have an smp_wmb() buried in its rcu_assign_pointer(); but it's not in the right place to make this particular guarantee. And an smp_rmb() needed somewhere too, if it doesn't already come for free somehow. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>