On Wed 26-12-12 01:26:07, Sha Zhengju wrote: > From: Sha Zhengju <handai.szj@xxxxxxxxxx> > > This patch adds memcg routines to count dirty pages, which allows memory controller > to maintain an accurate view of the amount of its dirty memory and can provide some > info for users while cgroup's direct reclaim is working. I guess you meant targeted resp. (hard/soft) limit reclaim here, right? It is true that this is direct reclaim but it is not clear to me why the usefulnes should be limitted to the reclaim for users. I would understand this if the users was in fact in-kernel users. [...] > To prevent AB/BA deadlock mentioned by Greg Thelen in previous version > (https://lkml.org/lkml/2012/7/30/227), we adjust the lock order: > ->private_lock --> mapping->tree_lock --> memcg->move_lock. > So we need to make mapping->tree_lock ahead of TestSetPageDirty in __set_page_dirty() > and __set_page_dirty_nobuffers(). But in order to avoiding useless spinlock contention, > a prepare PageDirty() checking is added. But there is another AA deadlock here I believe. page_remove_rmap mem_cgroup_begin_update_page_stat <<< 1 set_page_dirty __set_page_dirty_buffers __set_page_dirty mem_cgroup_begin_update_page_stat <<< 2 move_lock_mem_cgroup spin_lock_irqsave(&memcg->move_lock, *flags); mem_cgroup_begin_update_page_stat is not recursive wrt. locking AFAICS because we might race with the moving charges: CPU0 CPU1 page_remove_rmap mem_cgroup_can_attach mem_cgroup_begin_update_page_stat (1) rcu_read_lock mem_cgroup_start_move atomic_inc(&memcg_moving) atomic_inc(&memcg->moving_account) synchronize_rcu __mem_cgroup_begin_update_page_stat mem_cgroup_stolen <<< TRUE move_lock_mem_cgroup [...] mem_cgroup_begin_update_page_stat (2) __mem_cgroup_begin_update_page_stat mem_cgroup_stolen <<< still TRUE move_lock_mem_cgroup <<< DEADLOCK [...] mem_cgroup_end_update_page_stat rcu_unlock # wake up from synchronize_rcu [...] mem_cgroup_move_task mem_cgroup_move_charge walk_page_range mem_cgroup_move_account move_lock_mem_cgroup Maybe I have missed some other locking which would prevent this from happening but the locking relations are really complicated in this area so if mem_cgroup_{begin,end}_update_page_stat might be called recursively then we need a fat comment which justifies that. [...] -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html