Hi, On 05/05/2015 02:16 PM, Beata Michalska wrote: > Hi again, > > On 04/29/2015 11:13 AM, Greg KH wrote: >> On Wed, Apr 29, 2015 at 09:42:59AM +0200, Jan Kara wrote: >>> On Wed 29-04-15 09:03:08, Beata Michalska wrote: >>>> On 04/28/2015 07:39 PM, Greg KH wrote: >>>>> On Tue, Apr 28, 2015 at 04:46:46PM +0200, Beata Michalska wrote: >>>>>> On 04/28/2015 04:09 PM, Greg KH wrote: >>>>>>> On Tue, Apr 28, 2015 at 03:56:53PM +0200, Jan Kara wrote: >>>>>>>> On Mon 27-04-15 17:37:11, Greg KH wrote: >>>>>>>>> On Mon, Apr 27, 2015 at 05:08:27PM +0200, Beata Michalska wrote: >>>>>>>>>> On 04/27/2015 04:24 PM, Greg KH wrote: >>>>>>>>>>> On Mon, Apr 27, 2015 at 01:51:41PM +0200, Beata Michalska wrote: >>>>>>>>>>>> Introduce configurable generic interface for file >>>>>>>>>>>> system-wide event notifications, to provide file >>>>>>>>>>>> systems with a common way of reporting any potential >>>>>>>>>>>> issues as they emerge. >>>>>>>>>>>> >>>>>>>>>>>> The notifications are to be issued through generic >>>>>>>>>>>> netlink interface by newly introduced multicast group. >>>>>>>>>>>> >>>>>>>>>>>> Threshold notifications have been included, allowing >>>>>>>>>>>> triggering an event whenever the amount of free space drops >>>>>>>>>>>> below a certain level - or levels to be more precise as two >>>>>>>>>>>> of them are being supported: the lower and the upper range. >>>>>>>>>>>> The notifications work both ways: once the threshold level >>>>>>>>>>>> has been reached, an event shall be generated whenever >>>>>>>>>>>> the number of available blocks goes up again re-activating >>>>>>>>>>>> the threshold. >>>>>>>>>>>> >>>>>>>>>>>> The interface has been exposed through a vfs. Once mounted, >>>>>>>>>>>> it serves as an entry point for the set-up where one can >>>>>>>>>>>> register for particular file system events. >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Beata Michalska <b.michalska@xxxxxxxxxxx> >>>>>>>>>>>> --- >>>>>>>>>>>> Documentation/filesystems/events.txt | 231 ++++++++++ >>>>>>>>>>>> fs/Makefile | 1 + >>>>>>>>>>>> fs/events/Makefile | 6 + >>>>>>>>>>>> fs/events/fs_event.c | 770 ++++++++++++++++++++++++++++++++++ >>>>>>>>>>>> fs/events/fs_event.h | 25 ++ >>>>>>>>>>>> fs/events/fs_event_netlink.c | 99 +++++ >>>>>>>>>>>> fs/namespace.c | 1 + >>>>>>>>>>>> include/linux/fs.h | 6 +- >>>>>>>>>>>> include/linux/fs_event.h | 58 +++ >>>>>>>>>>>> include/uapi/linux/fs_event.h | 54 +++ >>>>>>>>>>>> include/uapi/linux/genetlink.h | 1 + >>>>>>>>>>>> net/netlink/genetlink.c | 7 +- >>>>>>>>>>>> 12 files changed, 1257 insertions(+), 2 deletions(-) >>>>>>>>>>>> create mode 100644 Documentation/filesystems/events.txt >>>>>>>>>>>> create mode 100644 fs/events/Makefile >>>>>>>>>>>> create mode 100644 fs/events/fs_event.c >>>>>>>>>>>> create mode 100644 fs/events/fs_event.h >>>>>>>>>>>> create mode 100644 fs/events/fs_event_netlink.c >>>>>>>>>>>> create mode 100644 include/linux/fs_event.h >>>>>>>>>>>> create mode 100644 include/uapi/linux/fs_event.h >>>>>>>>>>> >>>>>>>>>>> Any reason why you just don't do uevents for the block devices today, >>>>>>>>>>> and not create a new type of netlink message and userspace tool required >>>>>>>>>>> to read these? >>>>>>>>>> >>>>>>>>>> The idea here is to have support for filesystems with no backing device as well. >>>>>>>>>> Parsing the message with libnl is really simple and requires few lines of code >>>>>>>>>> (sample application has been presented in the initial version of this RFC) >>>>>>>>> >>>>>>>>> I'm not saying it's not "simple" to parse, just that now you are doing >>>>>>>>> something that requires a different tool. If you have a block device, >>>>>>>>> you should be able to emit uevents for it, you don't need a backing >>>>>>>>> device, we handle virtual filesystems in /sys/block/ just fine :) >>>>>>>>> >>>>>>>>> People already have tools that listen to libudev for system monitoring >>>>>>>>> and management, why require them to hook up to yet-another-library? And >>>>>>>>> what is going to provide the ability for multiple userspace tools to >>>>>>>>> listen to these netlink messages in case you have more than one program >>>>>>>>> that wants to watch for these things (i.e. multiple desktop filesystem >>>>>>>>> monitoring tools, system-health checkers, etc.)? >>>>>>>> As much as I understand your concerns I'm not convinced uevent interface >>>>>>>> is a good fit. There are filesystems that don't have underlying block >>>>>>>> device - think of e.g. tmpfs or filesystems working directly on top of >>>>>>>> flash devices. These still want to send notification to userspace (one of >>>>>>>> primary motivation for this interfaces was so that tmpfs can notify about >>>>>>>> something). And creating some fake nodes in /sys/block for tmpfs and >>>>>>>> similar filesystems seems like doing more harm than good to me... >>>>>>> >>>>>>> If these are "fake" block devices, what's going to be present in the >>>>>>> block major/minor fields of the netlink message? For some reason I >>>>>>> thought it was a required field, and because of that, I thought we had a >>>>>>> "real" filesystem somewhere to refer to, otherwise how would userspace >>>>>>> know what filesystem was creating these events? >>>>>>> >>>>>>> What am I missing here? >>>>>>> >>>>>>> confused, >>>>>>> >>>>>>> greg k-h >>>>>>> >>>>>> >>>>>> For those 'fake' block devs, upon mount, get_anon_bdev will assign >>>>>> the major:minor numbers. Userspace might get those through stat. >>>>> >>>>> How can userspace do the mapping backwards from this "anonymous" >>>>> major:minor number for these types of filesystems in such a way that >>>>> they can "know" how to report the block device that is causing the >>>>> event? >>>>> >>>>> thanks, >>>>> >>>>> greg k-h >>>>> >>>> >>>> It needs to be done internally by the app but is doable. >>>> The app knows what it is watching, so it can maintain the mappings. >>>> So prior to activating the notifications it can call 'stat' on the mount point. >>>> Stat struct gives the 'st_dev' which is the device id. Same will be reported >>>> within the message payload (through major:minor numbers). So having this, >>>> the app is able to get any other information it needs. >>>> Note that the events refer to the file system as a whole and they may not >>>> necessarily have anything to do with the actual block device. >> >> How are you going to show an event for a filesystem that is made up of >> multiple block devices? >> >>> Or you can use /proc/self/mountinfo for the mapping. There you can see >>> device numbers, real device names if applicable and mountpoints. This has >>> the advantage that it works even if filesystem mountpoints change. >> >> Ok, then that brings up my next question, how does this handle >> namespaces? What namespace is the event being sent in? block devices >> aren't namespaced, but the mount points are, is that going to cause >> problems? >> >> thanks, >> >> greg k-h >> > > Getting back to the namespaces ... > In the current state the notifications will be sent to the init network namespace, > which means that processes belonging to a different net namespace will not > be able to receive them. To be more precise, those processes will not be > able to subscribe to the multicast group, though this can be easily changed. > Furthermore, the notifications might also be sent to specific namespace. > In this case, the one, with which the trace for the mount point has been registered, > which as I believe would be the best approach. > > As for the mount namespaces, reading the config file needs to be slightly tweaked, > to hide away all the registered mount points which does not belong to the current > mount namespace. > > Still, there is one possible 'issue' - the private/slave mount points. > As the notifications will be sent to all the listeners (within the same netns), > the events might be visible to processes outside the given mount ns. > This should be limited to only those listeners that share the mount namespace, > to which such private/slave mount points belong. As using the generic netlink > to filter the outgoing messages is doable (with small changes to current > implementation), the filters themselves seem rather cumbersome, as they would require > finding the socket’s owner mount namespace, which just doesn't seems right. > On the other hand, identifying the file system, which generated the event, will > not be possible for processes outside such namespace, as device major:minor > numbers are not bound to any namespace (afaict) so they will not provide any > valid information. They will remain unresolved. > > The best way out here though, is to leave it to userspace to properly setup new namespaces: > the mount namespace with possible private/slave mounts should have a separate > network namespace to isolate the potential fs events, if required. > > > BR > Beata > > > I'm not really sure where we are with this RFC now (?). Just wanted to let You know I won't be available for the next two weeks, in case this comes around. Best Regards Beata -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html