Re: [RFC v3 1/4] fs: Add generic file system event notifications

Beata Michalska <b.michalska@xxxxxxxxxxx> · Wed, 17 Jun 2015 11:22:49 +0200

Hi,

On 06/16/2015 06:21 PM, Al Viro wrote:
> On Tue, Jun 16, 2015 at 03:09:30PM +0200, Beata Michalska wrote:
>> Introduce configurable generic interface for file
>> system-wide event notifications, to provide file
>> systems with a common way of reporting any potential
>> issues as they emerge.
>>
>> The notifications are to be issued through generic
>> netlink interface by newly introduced multicast group.
>>
>> Threshold notifications have been included, allowing
>> triggering an event whenever the amount of free space drops
>> below a certain level - or levels to be more precise as two
>> of them are being supported: the lower and the upper range.
>> The notifications work both ways: once the threshold level
>> has been reached, an event shall be generated whenever
>> the number of available blocks goes up again re-activating
>> the threshold.
>>
>> The interface has been exposed through a vfs. Once mounted,
>> it serves as an entry point for the set-up where one can
>> register for particular file system events.
> 
> Hmm...
> 
> 1) what happens if two processes write to that file at the same time,
> trying to create an entry for the same fs?  WARN_ON() and fail for one
> of them if they race?
>

There are some limits here - I admit. The entries in the config file

might be overwritten at any time - there is no support for multiple 

config entries for the same mounted fs. This is mainly due to the threshold

notifications: handling potentially numerous threshold limits each time

the number of available blocks changes didn't seem like a good idea.

So this is more like a global config, resembling sysfs fs-related tune options.

> 2) what happens if fs is mounted more than once (e.g. in different
> namespaces, or bound at different mountpoints, or just plain mounted
> several times in different places) and we add an event for each?
> More specifically, what should happen when one of those gets unmounted?
> 

Each write to that file is being handled within the current namespace.
Setting up an entry for a mount point from a different mnt namespace
needs switching to that ns. As for bound mounts: the entry exists

until the mount point it has been registered with is detached. 
The events can only be registered for one of the mount points,
as they are tied with the super
 block - so one cannot have a separate
config entry for each bound mounts.

> 3) what's the meaning of ->active?  Is that "fs_drop_trace_entry() hadn't
> been called yet" flag?  Unless I'm misreading it, we can very well get
> explicit removal race with umount, resulting in cleanup_mnt() returning
> from fs_event_mount_dropped() before the first process (i.e. write
> asking to remove that entry) gets around to its deactivate_super(),
> ending up with umount(2) on a filesystem that isn't mounted anywhere
> else reporting success to userland before the actual fs shutdown, which
> is not a nice thing to do...
> 

The 'active' means simply that the entry for a given mounted fs
is still
 valid in a way that the events are still required: the entry
in the config file
 has not been removed. When the trace is
 being removed
- it's 'active' filed gets invalidated to mark that the events for related
fs are no longer needed. deactivate_super() should get called only once,
dropping the
 reference acquired while creating the entry (fs_new_trace_entry).

While in fs_drop_trace_entry, lock is being held (in both cases: unmount and
explicit 
entry removal). The fs_drop_trace_entry will silently skip all
the clean-up if the 
entry is inactive. I might be missing smth here - though.
If so,I would really appreciate some more of your comments.

> 4) test in fs_event_mount_dropped() looks very odd - by that point we
> are absolutely guaranteed to have ->mnt_ns == NULL.  What's that supposed
> to do?
>

I have totally missed the fact that the mnt namespace pointer is invalidated

during unmount_tree - cannot really explain why that did happen. So thank You

for pointing that out. 
	This should be simply checking if it's still valid.
 This verification is
needed in case the mount that is being detached is not
 the one the events have
been registered with as they refer to fs not a particular
 mount point. This is
the case with the mnt namespaces: let's assume one registers
 for events for
particular mounted fs in an init mnt namespace, then the new mnt
 namespace is
being created with shared moutn points being cloned: so the same
 mount point
exists in both namespaces. Now if this mnt point gets detached:
 either through
umount or during the mnt namespace being swept out - the entry
 in the init mnt
namespace should remain untouched - same applies the other way round.

> 
> Al, trying to figure out the lifetime rules in all of that...
> 

Best Regards
Beata
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html