Re: [PATCH] memcg: add per cgroup dirty page accounting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09.03.2015 07:50, Greg Thelen wrote:> When modifying PG_Dirty on cached file pages, update the new
> MEM_CGROUP_STAT_DIRTY counter.  This is done in the same places where
> global NR_FILE_DIRTY is managed.  The new memcg stat is visible in the
> per memcg memory.stat cgroupfs file.  The most recent past attempt at
> this was http://thread.gmane.org/gmane.linux.kernel.cgroups/8632
>
> The new accounting supports future efforts to add per cgroup dirty
> page throttling and writeback.  It also helps an administrator break
> down a container's memory usage and provides evidence to understand
> memcg oom kills (the new dirty count is included in memcg oom kill
> messages).
>
> The ability to move page accounting between memcg
> (memory.move_charge_at_immigrate) makes this accounting more
> complicated than the global counter.  The existing
> mem_cgroup_{begin,end}_page_stat() lock is used to serialize move
> accounting with stat updates.
> Typical update operation:
> 	memcg = mem_cgroup_begin_page_stat(page)
> 	if (TestSetPageDirty()) {
> 		[...]
> 		mem_cgroup_update_page_stat(memcg)
> 	}
> 	mem_cgroup_end_page_stat(memcg)
>
> Summary of mem_cgroup_end_page_stat() overhead:
> - Without CONFIG_MEMCG it's a no-op
> - With CONFIG_MEMCG and no inter memcg task movement, it's just
>    rcu_read_lock()
> - With CONFIG_MEMCG and inter memcg  task movement, it's
>    rcu_read_lock() + spin_lock_irqsave()
>
> A memcg parameter is added to several routines because their callers
> now grab mem_cgroup_begin_page_stat() which returns the memcg later
> needed by for mem_cgroup_update_page_stat().
>
> Because mem_cgroup_begin_page_stat() may disable interrupts, some
> adjustments are needed:
> - move __mark_inode_dirty() from __set_page_dirty() to its caller.
>    __mark_inode_dirty() locking does not want interrupts disabled.
> - use spin_lock_irqsave(tree_lock) rather than spin_lock_irq() in
>    __delete_from_page_cache(), replace_page_cache_page(),
>    invalidate_complete_page2(), and __remove_mapping().

This patch conflicts with my cleanup which is already in mm tree:
("page_writeback: clean up mess around cancel_dirty_page()")
Nothing nontrivial but I've killed cancel_dirty_page() and replaced
it which account_page_cleaned() symmetrical to account_page_dirtied().


I think this accounting can be done without mem_cgroup_begin_page_stat()
All page cleaning happens under page is lock.
Some dirtying is called without page-lock when kernel moves
dirty status from pte to page, but in this case acconting happens
under mapping->tree_lock.

Memcg already locks pages when moves them between cgroups,
maybe it could also lock mapping->tree_lock?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]