On Fri, 8 Apr 2011 11:25:56 +1000 Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Thu, Apr 07, 2011 at 05:59:35PM -0700, Greg Thelen wrote: > > cc: linux-mm > > > > Dave Chinner <david@xxxxxxxxxxxxx> writes: > > If we later find that this supposed uncommon shared inode case is > > important then we can either implement the previously described lru > > scanning in mem_cgroup_balance_dirty_pages() or consider extending the > > bdi/memcg/inode data structures (perhaps with a memcg_mapping) to > > describe such sharing. > > Hmm, another idea I just had. What we're trying to avoid is needing > to a) track inodes in multiple lists, and b) scanning to find > something appropriate to write back. > > Rather than tracking at page or inode granularity, how about > tracking "associated" memcgs at the memcg level? i.e. when we detect > an inode is already dirty in another memcg, link the current memcg > to the one that contains the inode. Hence if we get a situation > where a memcg is throttling with no dirty inodes, it can quickly > find and start writeback in an "associated" memcg that it _knows_ > contain shared dirty inodes. Once we've triggered writeback on an > associated memcg, it is removed from the list.... > Thank you for an idea. I think we can start from following. 0. add some feature to set 'preferred inode' for memcg. I think fadvise(fd, MAKE_THIF_FILE_UNDER_MY_MEMCG) or echo fd > /memory.move_file_here can be added. 1. account dirty pages for a memcg. as Greg does. 2. at the same time, account dirty pages made dirty by threads in a memcg. (to check which internal/external thread made page dirty.) 3. calculate internal/external dirty pages gap. With gap, we can have several choices. 4-a. If it exceeds some thresh, do some notify. userland daemon can decide to move pages to some memcg or not. (Of coruse, if the _shared_ dirty can be caught before making page dirty, user daemon can move inode before making it dirty by inotify().) I like helps of userland because it can be more flexible than kernel, it can eat config files. 4-b. set some flag to memcg as 'this memcg is dirty busy because of some extarnal threads'. When a page is newly dirtied, check the thread's memcg. If the memcg of a thread and a page is different from each other, write a memo as 'please check this memcgid , too' in task_struct and do double-memcg-check in balance_dirty_pages(). (How to clear per-task flag is difficult ;) I don't want to handle 3-100 threads does shared write case..;) we'll need 4-a. Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html