On Tue, Jun 04, 2013 at 01:50:25PM -0700, Tejun Heo wrote: > Hello, Michal. > > On Tue, Jun 04, 2013 at 03:18:43PM +0200, Michal Hocko wrote: > > > + if (memcg) > > > + css_get(&memcg->css); > > > > This is all good and nice but it re-introduces the same problem which > > has been fixed by (5f578161: memcg: relax memcg iter caching). You are > > pinning memcg in memory for unbounded amount of time because css > > reference will not let object to leave and rest. > > I don't get why that is a problem. Can you please elaborate? css's > now explicitly allow holding onto them. We now have clear separation > of "destruction" and "release" and blkcg also depends on it. If memcg > still doesn't distinguish the two properly, that's where the problem > should be fixed. > > > I understand your frustration about the complexity of the current > > synchronization but we didn't come up with anything easier. > > Originally I though that your tree walk updates which allow dropping rcu > > would help here but then I realized that not really because the iterator > > (resp. pos) has to be a valid pointer and there is only one possibility > > to do that AFAICS here and that is css pinning. And is no-go. > > I find the above really weird. If css can't be pinned for position > caching, isn't it natural to ask why it can't be and then fix it? > Because that's what the whole refcnt thing is about and a usage which > cgroup explicitly allows (e.g. blkcg also does it). Why do you go > from there to "this batshit crazy barrier dancing is the only > solution"? > > Can you please explain why memcg css's can't be pinned? We might pin them indefinitely. In a hierarchy with hundreds of groups that is short by 10M of memory, we only reclaim from a couple of groups before we stop and leave the iterator pointing somewhere in the hierarchy. Until the next reclaimer comes along, which might be a split second later or three days later. There is a reclaim iterator for every memcg (since every memcg represents a hierarchy), so we could pin a lot of csss for an indefinite amount of time. If you say that the delta between destruction and release is small enough, I'd be happy to get rid of the weak referencing. We had weak referencing with css_id before and didn't want to lose predictability and efficiency of our resource usage when switching away from it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>