On Wed 01-10-14 07:50:21, Trond Myklebust wrote: > On Wed, Oct 1, 2014 at 5:05 AM, Jan Kara <jack@xxxxxxx> wrote: > > On Tue 30-09-14 17:13:19, Trond Myklebust wrote: > >> On Tue, Sep 30, 2014 at 4:53 PM, Zach Brown <zab@xxxxxxxxx> wrote: > >> > On Tue, Sep 30, 2014 at 12:48:45PM +0200, Jan Kara wrote: > >> >> On Tue 30-09-14 10:11:32, Thanos Makatos wrote: > >> >> > > >> >> > Regarding extending the ioctl to invalidate the page cache, do you have > >> >> > any suggestions where I could start looking? > >> >> You just need to call invalidate_inode_pages2(). That is going to do all > >> >> you need. > >> >> > >> >> > Would such a new ioctl have any chance to be accepted upstream? > >> >> I believe a possibility for a file to be fully flushed from page cache is > >> >> useful at times and if you present well your usecase there are reasonable > >> >> chances it will get accepted upstream. > >> > > >> > Agreed, this seems reasonable. How many times have we all dropped our > >> > entire cache just 'cause we didn't have a more precise tool? > >> > > >> > $ grep -ri drop_caches xfstests/ > >> > xfstests/src/fsync-tester.c: if ((fd = open("/proc/sys/vm/drop_caches", O_WRONLY)) < 0) { > >> > xfstests/src/stale_handle.c: system("echo 3 > /proc/sys/vm/drop_caches"); > >> > xfstests/common/quota: echo 3 > /proc/sys/vm/drop_caches > >> > > >> > The last one even says: > >> > > >> > # XXX: really need an ioctl instead of this big hammer > >> > echo 3 > /proc/sys/vm/drop_caches > >> > > >> > :) > >> > > >> > >> It would definitely be useful for NFS, however we'd want the option of > >> clearing the cached metadata too (acls, mode bits, owner/group owner, > >> etc.) > > Hum, I can imagine you might be somehow successful in flushing ACLs but > > how would you like to flush mode or owner? It's not like you can just > > reload inode from disk and overwrite what you have in memory. What would > > look sane is to push inode with all metadata it caches out of memory but > > once somehow holds the inode (and ioctl() itself will have a file handle of > > the inode), you just cannot. So I'm not sure if you meant something > > different or if this was just a wish without deeper thought. > > You can and you _must_ reload the inode if it changes on the server. > That's what makes distributed filesystems "interesting" as far as > caching goes. What we do in the cases where we think the file metadata > may have changed, is to mark the existing cached metadata as being > stale so that we know that we need to revalidate before trying to use > it. Ah OK, I somehow thought about revalidation automatically happening for any filesystem and found that impossible. I can imagine doing this specifically for NFS which is designed to handle such things is possible :) > So being able to tell in the ioctl() whether or not the application > thinks the data or metadata (or both) may have changed on the remote > disk is actually a very useful feature if you are trying to do things > like distributed compiles. > > I've had an experimental NFS-only ioctl() based caching interface > kicking around in my git repository for a couple of years now > (http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/ioctl). > If we are planning on doing something at the VFS level, then it would > be nice to at least duplicate the functionality from the first 2 > patches. Thanks for the pointer. The first two patches look useful and I think we can design the ioctl interface to accommodate the NFS usecase (although most filesystems will just return EOPNOTSUPP in case someone asks for metadata invalidation). The interface in your patches looks mostly OK, I'd just rather use 'flags' than 'cmd'. So something like: struct invalidate_arg { u64 flags; u64 start; u64 len; }; Where 'flags' can be: INVAL_METADATA - invalidate inode itself, acls, etc. if fs supports it (this can be more finegrained if that's useful. Is it?) INVAL_DATA - invalidate data in range <start, start+len) We can have a fs callback that gets called for this ioctl to do the work, most filesystems would just leave it at NULL which would fall back to the default function which just handles INVAL_DATA and returns EOPNOTSUPP if INVAL_METADATA is set. NFS could then implement its own thing for INVAL_METADATA and call the generic function for handling INVAL_DATA. Thoughts? Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html