Re: 4.3-rc1 dirty page count underflow (cgroup-related?)

Dave Hansen <dave.hansen@xxxxxxxxx> · Fri, 18 Sep 2015 07:50:42 -0700

On 09/17/2015 11:09 PM, Greg Thelen wrote:
> I'm not denying the issue, bug the WARNING splat isn't necessarily
> catching a problem.  The corresponding code comes from your debug patch:
> +		WARN_ONCE(__this_cpu_read(memcg->stat->count[MEM_CGROUP_STAT_DIRTY]) > (1UL<<30), "MEM_CGROUP_STAT_DIRTY bogus");
> 
> This only checks a single cpu's counter, which can be negative.  The sum
> of all counters is what matters.
> Imagine:
> cpu1) dirty page: inc
> cpu2) clean page: dec
> The sum is properly zero, but cpu2 is -1, which will trigger the WARN.
> 
> I'll look at the code and also see if I can reproduce the failure using
> mem_cgroup_read_stat() for all of the new WARNs.

D'oh.  I'll replace those with the proper mem_cgroup_read_stat() and
test with your patch to see if anything still triggers.

> Did you notice if the global /proc/meminfo:Dirty count also underflowed?

It did not underflow.  It was one of the first things I looked at and it
looked fine, went down near 0 at 'sync', etc...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>