* dan.magenheimer@xxxxxxxxxx <dan.magenheimer@xxxxxxxxxx> [2010-08-20 08:14:59]: > Hi Christophe (and others interested in cleancache progress) -- > > Thanks for taking some time to talk with me about cleancache > at LSF summit! You had some interesting thoughts and suggestions > that I said I would investigate. They are: > > 1) use inode kva as key instead of i_ino > 2) eliminate cleancache shim and call zcache directly > 3) fs's requiring key > inode_t (e.g. 64-bit-fs on 32-bit-kernel) > 4) eliminate fs-specific code entirely (e.g. "opt-in") > 5) eliminate global variable > > Here's my conclusions: > > 1) You suggested using the inode kva as a "key" for cleancache. > I think your goal was to make it more fs-independent and also > to eliminate the need for using a per-fs enabler and "pool id". > I looked at this but it will not work because cleancache > retains page cache data pages persistently even when the > inode has been pruned from the inode_unused_list and only > flushes the data pages if the file gets removed/truncated. If > cleancache used the inode kva, there would be coherency issues > when the inode kva is reused. Alternately, if cleancache > flushed the pages when the inode kva was freed, much of > the value of cleancache would be lost because the cache > of pages in cleancache is potentially much larger than > the page cache and is most useful if the pages survive > inode cache removal. > > If I misunderstood your proposal or if you disagree, please > let me know. > > 2) You suggested eliminating the cleancache shim layer and just > directly calling zcache, effectively eliminating Xen as > a user. During and after LSF summit, I talked to developers > from Google who are interested in investigating the cleancache > interface for use with cgroups, an IBM developer who was > interested in cleancache for optimizing NUMA, and soon I > will be talking to HP Labs about using it as an interface > for "memory blades". I also think Rik van Riel and Mel Gorman > were intrigued about its use for collecting better memory > utilization statistics to drive guest/host memory "rightsizing". > While it is true that none of these are current users yet, even > if you prefer to ignore Xen tmem as a user, it seems silly to > throw away the cleanly-layered generic cleancache interface now, > only to add it back later when more users are added. > > 3) You re-emphasized the problem where cleancache's use of > the inode number as a key will cause problems on many 64-bit > filesystems especially running on a 32-bit kernel. With > help from Andreas Dilger, I'm trying to work out a generic > solution for this using s_export_op->encode_fh which would > be used for any fs that provides it to guarantee a unique > multi-word key for a file, while preserving the > shorter i_ino as a key for fs's for which i_ino is unique. > > 4) Though you were out of the room during the cleancache > lightning talk, other filesystem developers seemed OK > with the "opt-in" approach (as documented in lwn.net)... > one even asked "can't you just add a bit to the superblock?" > to which I answered "that's essentially what the one > line opt-in addition does". Not sure if you are still > objecting to that, but especially given that the 64-bit-fs-on > 32-bit-kernel issue above only affects some filesystems, > I'm still thinking it is necessary. > > 5) You commented (before LSF) that the global variable should > be avoided which is certainly valid, and I will try Nitin's > suggestion to add a registration interface. > > Did I miss anything? > > I plan to submit a V4 for cleancache soon, and hope you will > be inclined to ack this time. > Hi, Dan, Sorry for commenting on your post so late. I've had some time to read through your approach and compare it to my approach (http://www.linuxsymposium.org/2010/view_abstract.php?content_key=32) and I had a few quick questions 1. Can't this be done at the MM layer - why the filesystem hooks? Is it to enable faster block devices in the reclaim hierarchy? 2. I don't see a mention of slabcache in your approach, reclaim free pages or freeing potentially free slab pages. -- Three Cheers, Balbir -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html