On Wed, 30 Jun 2010 21:16:43 -0700 Greg Thelen <gthelen@xxxxxxxxxx> wrote: > > == > > register inotify and add watches. > > The wathces will see OPEN and IN_DELETE_SELF. > > > > run 2 threads. > > > > Thread1: > > while(1) { > > read() // check events from inotify. > > maintain opened-file information. > > } > > > > Thread2: > > while (1) { > > check opend-file information. > > select a file // you may implement some scheduling, here. > > open, > > mmap > > mincore() .... checks the file is cached. > > madvice() > > // if you want, touch pages and add Access bit to them. > > close(), > > > > sleep if necessary. > > } > > == > > batch-style cron-job rather than sleep will not be very bad for usual use. > > But we may need some interface to implement something clever algorithm. > > I have to collect some data about expected usages of this feature. I > will have more information tomorrow. Depending on the how quickly the > charges need to be corrected or the number of opened files, this > daemon may end up doing a lot of polling to correct memory charges. > maybe. but many applications works with a-lot-of-jobs without special kernel support. > >> If the number of directories within /tmp/db is large, then inotify() > >> maybe expensive. I don't think this is a problem. > >> > >> Another worry I have is that if for some reason the daemon is started > >> after the job, or if the daemon crashes and is restarted, then files > >> may have been opened and charged to cg11 without the inotify being > >> setup. > > yes. > > > >> The daemon would have problems finding the pages that were > >> charged to cg11 and need to be moved to cg1. The daemon could scan > >> the open file table of T1, but any files that are no longer opened may > >> be charged to cg11 with no way for the daemon to find them. > >> > > > > Above thread-1 can maintain "opened-file" database. > > Or you can run a recovery-scirpt to open /proc/<xxxx>/fd of processes > > to trigger OPEN events. > > If a file has been unlinked, then the OPEN events would need to scan > /proc/xxx/fd to find an open file handle to open. This is probably a > corner case, but I wanted to mention it. > sure. > > But yes, some in-kernel approach may be required. as...new interface to memcg > > rather than madvise. > > > > /memory.move_file_caches > > - when you open this and write()/ioctl() file descriptor to this file, > > all on-memory pages of files will be moved to this cgroup. > > Are you suggesting that this move_file_caches interface would > associate the given file, dentry, or inode with the cgroup so that > future charges are charged to the intended cgroup? Or (I suspect) > that the daemon would this need to be periodically use this routine to > correct any incorrect charges. > My idea is for recharging instead of mincode()+madise(). > > Hmm...we may be able to add an interface to know last-pagecache-update time. > > (Because access-time is tend to be omitted at mount....) > > Are you thinking that we could introduce a cgroup-wide attribute > (maybe a timestamp, or increasing sequence number, or even just a bit) > that would be set whenever a cgroup statistic (page cache usage in > this case) was updated? This bit would be cleared whenever all needed > migrations occurred. The daemon could poll this bit to know if any > migrations were needed. Now, memory cgroup has "threshold" cgroup notifier. I think it's useful in this case. > > Another aspect that I am thinking would have to be added to the daemon > would be oom handling. If cg11 is charged for non-reclaimable files > (tmpfs) that belong to cg1, then the task may oom. The daemon would > have to listen for oom and then immediately migration the charge from > cg11 to cg1 to lower memory pressure in cg11. > Now, memory cgroup has an interface to disable-oom-kill + oom-notifier. I think it's useful. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>