On 09.03.2015 07:50, Greg Thelen wrote:> When modifying PG_Dirty on
cached file pages, update the new
> MEM_CGROUP_STAT_DIRTY counter. This is done in the same places where
> global NR_FILE_DIRTY is managed. The new memcg stat is visible in the
> per memcg memory.stat cgroupfs file. The most recent past attempt at
> this was http://thread.gmane.org/gmane.linux.kernel.cgroups/8632
>
> The new accounting supports future efforts to add per cgroup dirty
> page throttling and writeback. It also helps an administrator break
> down a container's memory usage and provides evidence to understand
> memcg oom kills (the new dirty count is included in memcg oom kill
> messages).
>
> The ability to move page accounting between memcg
> (memory.move_charge_at_immigrate) makes this accounting more
> complicated than the global counter. The existing
> mem_cgroup_{begin,end}_page_stat() lock is used to serialize move
> accounting with stat updates.
> Typical update operation:
> memcg = mem_cgroup_begin_page_stat(page)
> if (TestSetPageDirty()) {
> [...]
> mem_cgroup_update_page_stat(memcg)
> }
> mem_cgroup_end_page_stat(memcg)
>
> Summary of mem_cgroup_end_page_stat() overhead:
> - Without CONFIG_MEMCG it's a no-op
> - With CONFIG_MEMCG and no inter memcg task movement, it's just
> rcu_read_lock()
> - With CONFIG_MEMCG and inter memcg task movement, it's
> rcu_read_lock() + spin_lock_irqsave()
>
> A memcg parameter is added to several routines because their callers
> now grab mem_cgroup_begin_page_stat() which returns the memcg later
> needed by for mem_cgroup_update_page_stat().
>
> Because mem_cgroup_begin_page_stat() may disable interrupts, some
> adjustments are needed:
> - move __mark_inode_dirty() from __set_page_dirty() to its caller.
> __mark_inode_dirty() locking does not want interrupts disabled.
> - use spin_lock_irqsave(tree_lock) rather than spin_lock_irq() in
> __delete_from_page_cache(), replace_page_cache_page(),
> invalidate_complete_page2(), and __remove_mapping().
This patch conflicts with my cleanup which is already in mm tree:
("page_writeback: clean up mess around cancel_dirty_page()")
Nothing nontrivial but I've killed cancel_dirty_page() and replaced
it which account_page_cleaned() symmetrical to account_page_dirtied().
I think this accounting can be done without mem_cgroup_begin_page_stat()
All page cleaning happens under page is lock.
Some dirtying is called without page-lock when kernel moves
dirty status from pte to page, but in this case acconting happens
under mapping->tree_lock.
Memcg already locks pages when moves them between cgroups,
maybe it could also lock mapping->tree_lock?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html