Re: Re: RFC: fsnotify - Add support for ignoring self initiated events

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wednesday, September 11, 2013 10:27:54 J. Bruce Fields wrote:
> What sort of cache consistency does this give you?  (It's not perfect,
> because you always get the notification after the event has already
> happened, right?)

It is certainly not atomic.  However, our (nfs-ganesha) use cases can tolerate 
some level of delay.

1. We populate an avl tree with the names from a readdir.  If another process, 
e.g. samba, adds/removes a file we want to know about it.  Readdir already has 
races in that the same thing can happen in a portion of the directory already 
scanned while we are still reading the last of it.

2. We use the avl tree for lookups.  Same thing applies here.  If the 
directory is current, we can serve the lookup from cache.  In the case of a 
lookup prior to event, the next step in the protocol would be to do something 
directy to the file which will discover the removal.  We've always had the 
problem of lookups for files that don't exist yet and we cope with retries or 
???.  All we would get here is a bit more delay.

The event will trigger an upcall into the cache inode layer to mark the 
directory as "stale".  The inotify payload gives us the name as well so we can 
do things to the tree like add/remove the entry.  In the lookup case, if we 
know the directory is stale, we can go directly to the filesystem rather than 
assume the directory is consistant.

In the end, yes there is an inconsistancy window but it does close.
> 
> It looks like you're using the tgid.  I guess Ganesha runs a bunch of
> threads all sharing the same tgid?  Would a server using multiple
> processes instead need a different interface?

Yes, it is using the tgid which, of course, implies that the server is NPTL 
based (or equiv).  It does not take into consideration the multiple processes 
with shared memory case.   The completely separate process case is covered in 
the samba+nfs-gaanesha case.  One knows what the other is doing filtered by the 
ignore flag.
> 
> --b.
> 
> On Wed, Sep 04, 2013 at 11:30:59AM -0700, Jim Lieb wrote:
> > Our use case is an NFS+pNFS+9P user mode server.  We need to keep our
> > caches (dentry+inode) current with the underlying kernel.  To do this we
> > need inotify to feed filesystem events to our upcall infrastructure.  We
> > place a watch on each directory we have cached and any events that morph
> > that directory would cause invalidates and/or updates to those entries.
> > 
> > The current fsnotify subsystem does most of what we want but implicit in
> > the dnotify/inotify/fanotify interfaces is the assumption that the watcher
> > is an "innocent bystander" whose sole/main function is to draw/remove
> > icons
> > on a window when someone else adds/removes things from a directory.  Part
> > of our use case is that the ganesha.nfsd server is co-resident with a CIFS
> > server which is also exporting the same filesystem(s) and service
> > management tools that modify the filesystem structure (snapshots and
> > volume adds...). The current inotify interface will send up all events
> > but if both servers are equally busy we get two bad results:
> > 
> > 1. each server gets twice the traffic it really needs (theirs and ours).
> > 
> > 2. there is no simple way to tell their events from ours in each event.
> > 
> > This patch set adds a new watch/mark flag (FS_IGNORE_ME) to fsnotify.
> > Setting this flag causes the watching process's pid to be stored in the
> > mark for the inode.  The flag is tested at event time and if set and if
> > the pid of the event generating process matches the stored pid, the event
> > is ignored, saving the overhead of allocating an event, pushing it up to
> > user space only to be rejected. Being in fsnotify makes it available to
> > any notification scheme built on fsnotify.
> > 
> > The IN_IGNORE_ME flag bit is added to inotify.  When set, none of the
> > other
> > event flags will generate an event if the calling process generated the
> > event.  Given the current way that inotify_add_watch() validates the flags
> > argument, discovering whether the kernel supports the flag requires an
> > extra test (set the watch and generate an event...).
> > 
> > The FAN_IGNORE_ME flag bit does the same for fanotify.  fanotify in
> > current
> > kernels will return an EINVAL error if this bit is set, making discovery
> > easier.  One performance side effect is that this flag eliminates the need
> > and overhead for a test of my_pid == e.pid in the event processing loop.
> > 
> > We chose inotify rather than the current fanotify because we need the
> > extra
> > events that fanotify cannot (currently) support.  dnotify was not touched
> > because it is both obsolete and its api makes this extension difficult.
> > 
> > Please review and comment.  If it is acceptable, please ACK and merge.
> > 
> > Thanks
> > 
> > Jim Lieb, NFS Ganesha project
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel"
> > in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jim Lieb
Linux Systems Engineer
Panasas Inc.

"If ease of use was the only requirement, we would all be riding tricycles"
- Douglas Engelbart 1925–2013
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux