On Wed, Aug 31, 2011 at 11:40 PM, Johannes Weiner <jweiner@xxxxxxxxxx> wrote: > On Wed, Aug 31, 2011 at 11:05:51PM -0700, Ying Han wrote: >> On Tue, Aug 30, 2011 at 1:42 AM, Johannes Weiner <jweiner@xxxxxxxxxx> wrote: >> > You want to look at A and see whether its limit was responsible for >> > reclaim scans in any children. IMO, that is asking the question >> > backwards. Instead, there is a cgroup under reclaim and one wants to >> > find out the cause for that. Not the other way round. >> > >> > In my original proposal I suggested differentiating reclaim caused by >> > internal pressure (due to own limit) and reclaim caused by >> > external/hierarchical pressure (due to limits from parents). >> > >> > If you want to find out why C is under reclaim, look at its reclaim >> > statistics. If the _limit numbers are high, C's limit is the problem. >> > If the _hierarchical numbers are high, the problem is B, A, or >> > physical memory, so you check B for _limit and _hierarchical as well, >> > then move on to A. >> > >> > Implementing this would be as easy as passing not only the memcg to >> > scan (victim) to the reclaim code, but also the memcg /causing/ the >> > reclaim (root_mem): >> > >> > root_mem == victim -> account to victim as _limit >> > root_mem != victim -> account to victim as _hierarchical >> > >> > This would make things much simpler and more natural, both the code >> > and the way of tracking down a problem, IMO. >> >> This is pretty much the stats I am currently using for debugging the >> reclaim patches. For example: >> >> scanned_pages_by_system 0 >> scanned_pages_by_system_under_hierarchy 50989 >> >> scanned_pages_by_limit 0 >> scanned_pages_by_limit_under_hierarchy 0 >> >> "_system" is count under global reclaim, and "_limit" is count under >> per-memcg reclaim. >> "_under_hiearchy" is set if memcg is not the one triggering pressure. > > I don't get this distinction between _system and _limit. How is it > orthogonal to _limit vs. _hierarchy, i.e. internal vs. external? Something like : +enum mem_cgroup_scan_context { + SCAN_BY_SYSTEM, + SCAN_BY_SYSTEM_UNDER_HIERARCHY, + SCAN_BY_LIMIT, + SCAN_BY_LIMIT_UNDER_HIERARCHY, + NR_SCAN_CONTEXT, +}; if (global_reclaim(sc)) context = scan_by_system else context = scan_by_limit if (target != mem) context++; > > If the system scans memcgs then no limit is at fault. It's just > external pressure. > > For example, what is the distinction between scanned_pages_by_system > and scanned_pages_by_system_under_hierarchy? you are right about this, there is no much difference on these since it is counting global reclaim and everyone is under_hierarchy except root_cgroup. For root cgroup, it is counted in "_system". (internal) The reason for scanned_pages_by_system would be, per your definition, neither due to > the limit (_by_system -> global reclaim) nor not due to the limit > (!_under_hierarchy -> memcg is the one triggering pressure) This value "scanned_pages_by_system" only making senses for root cgroup, which now could be counted as "# of pages scanned in root lru under global reclaim". --Ying -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href