Re: + memcg-fix-dirty-page-migration.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 21, 2015 at 02:34:52PM -0700, akpm@xxxxxxxxxxxxxxxxxxxx wrote:
> From: Greg Thelen <gthelen@xxxxxxxxxx>
> Subject: memcg: fix dirty page migration
> 
> The problem starts with a file backed dirty page which is charged to a
> memcg.  Then page migration is used to move oldpage to newpage.
> 
> Migration:
> - copies the oldpage's data to newpage
> - clears oldpage.PG_dirty
> - sets newpage.PG_dirty
> - uncharges oldpage from memcg
> - charges newpage to memcg
> 
> Clearing oldpage.PG_dirty decrements the charged memcg's dirty page count.
>  However, because newpage is not yet charged, setting newpage.PG_dirty
> does not increment the memcg's dirty page count.  After migration
> completes newpage.PG_dirty is eventually cleared, often in
> account_page_cleaned().  At this time newpage is charged to a memcg so the
> memcg's dirty page count is decremented which causes underflow because the
> count was not previously incremented by migration.  This underflow causes
> balance_dirty_pages() to see a very large unsigned number of dirty memcg
> pages which leads to aggressive throttling of buffered writes by processes
> in non root memcg.
> 
> This issue:
> - can harm performance of non root memcg buffered writes.
> - can report too small (even negative) values in
>   memory.stat[(total_)dirty] counters of all memcg, including the root.
> 
> To avoid polluting migrate.c with #ifdef CONFIG_MEMCG checks, introduce
> page_memcg() and set_page_memcg() helpers.
> 
> Test:
>     0) setup and enter limited memcg
>     mkdir /sys/fs/cgroup/test
>     echo 1G > /sys/fs/cgroup/test/memory.limit_in_bytes
>     echo $$ > /sys/fs/cgroup/test/cgroup.procs
> 
>     1) buffered writes baseline
>     dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
>     sync
>     grep ^dirty /sys/fs/cgroup/test/memory.stat
> 
>     2) buffered writes with compaction antagonist to induce migration
>     yes 1 > /proc/sys/vm/compact_memory &
>     rm -rf /data/tmp/foo
>     dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
>     kill %
>     sync
>     grep ^dirty /sys/fs/cgroup/test/memory.stat
> 
>     3) buffered writes without antagonist, should match baseline
>     rm -rf /data/tmp/foo
>     dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
>     sync
>     grep ^dirty /sys/fs/cgroup/test/memory.stat
> 
>                        (speed, dirty residue)
>              unpatched                       patched
>     1) 841 MB/s 0 dirty pages          886 MB/s 0 dirty pages
>     2) 611 MB/s -33427456 dirty pages  793 MB/s 0 dirty pages
>     3) 114 MB/s -33427456 dirty pages  891 MB/s 0 dirty pages
> 
>     Notice that unpatched baseline performance (1) fell after
>     migration (3): 841 -> 114 MB/s.  In the patched kernel, post
>     migration performance matches baseline.
> 
> Fixes: c4843a7593a9 ("memcg: add per cgroup dirty page accounting")
> Signed-off-by: Greg Thelen <gthelen@xxxxxxxxxx>
> Reported-by: Dave Hansen <dave.hansen@xxxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>	[4.2+]
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>

Great catch, Greg.

Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]