On Thu, Apr 20, 2023 at 10:46 AM Christian Brauner <brauner@xxxxxxxxxx> wrote: > > On Thu, Apr 20, 2023 at 09:12:52AM +0300, Amir Goldstein wrote: > > On Wed, Apr 19, 2023 at 8:19 PM Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > > > > On Tue, Apr 18, 2023 at 06:20:22PM +0300, Amir Goldstein wrote: > > > > On Tue, Apr 18, 2023 at 5:12 PM Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > > > > > > > > On Tue, Apr 18, 2023 at 04:56:40PM +0300, Amir Goldstein wrote: > > > > > > On Tue, Apr 18, 2023 at 4:33 PM Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > > [...] > > > > > > > > Just thought of another reason: > > > > > > c) FAN_UNMOUNT does not need to require FAN_REPORT_FID > > > > > > so it does not depend on filesystem having a valid f_fsid nor > > > > > > exports_ops. In case of "pseudo" fs, FAN_UNMOUNT can report > > > > > > only MNTID record (I will amend the patch with this minor change). > > > > > > > > > > I see some pseudo fses generate f_fsid, e.g., tmpfs in mm/shmem.c > > > > > > > > tmpfs is not "pseudo" in my eyes, because it implements a great deal of the > > > > vfs interfaces, including export_ops. > > > > > > The term "pseudo" is somewhat well-defined though, no? It really just > > > means that there's no backing device associated with it. So for example, > > > anything that uses get_tree_nodev() including tmpfs. If erofs is > > > compiled with fscache support it's even a pseudo fs (TIL). > > > > > > > Ok, "pseudo fs" is an ambiguous term. > > > > For the sake of this discussion, let's refer to fs that use get_tree_nodev() > > "non-disk fs". > > > > But as far as fsnotify is concerned, tmpfs is equivalent to xfs, because > > all of the changes are made by users via vfs. > > > > Let's call fs where changes can occur not via vfs "remote fs", those > > include the network fs and some "internal fs" like the kernfs class of fs > > and the "simple fs" class of fs (i.e. simple_fill_super). > > > > With all the remote fs, the behavior of fsnotify is (and has always been) > > undefined, that is, you can use inotify to subscribe for events and you > > never know what you will get when changes are not made via vfs. > > > > Some people (hypothetical) may expect to watch nsfs for dying ns > > and may be disappointed to find out that they do not get the desired > > IN_DELETE event. > > > > We have had lengthy discussions about remote fs change notifications > > with no clear decisions of the best API for them: > > https://lore.kernel.org/linux-fsdevel/20211025204634.2517-1-iangelak@xxxxxxxxxx/ > > > > > > > > > > and also I fixed its f_fsid recently: > > > > 59cda49ecf6c shmem: allow reporting fanotify events with file handles on tmpfs > > > > > > Well thank you for that this has been very useful in userspace already > > > I've been told. > > > > > > > > > > > > At the risk of putting my foot in my mouth, what's stopping us from > > > > > making them all support f_fsid? > > > > > > > > Nothing much. Jan had the same opinion [1]. > > > > > > I think that's what we should try to do without having thought too much > > > about potential edge-cases. > > > > > > > > > > > We could do either: > > > > 1. use uuid_to_fsid() in vfs_statfs() if fs has set s_uuid and not set f_fsid > > > > 2. use s_dev as f_fsid in vfs_statfs() if fs did not set f_fsid nor s_uuid > > > > 3. randomize s_uuid for simple fs (like tmpfs) > > > > 4. any combination of the above and more > > > > > > > > Note that we will also need to decide what to do with > > > > name_to_handle_at() for those pseudo fs. > > > > > > Doing it on the fly during vfs_statfs() feels a bit messy and could > > > cause bugs. One should never underestimate the possibility that there's > > > some fs that somehow would get into trouble because of odd behavior. > > > > > > So switching each fs over to generate a s_uuid seems the prudent thing > > > to do. Doing it the hard way also forces us to make sure that each > > > filesystem can deal with this. > > > > > > It seems that for pseudo fses we can just allocate a new s_uuid for each > > > instance. So each tmpfs instance - like your patch did - would just get > > > a new s_uuid. > > > > > > For kernel internal filesystems - mostly those that use init_pseudo - > > > the s_uuid would remain stable until the next reboot when it is > > > regenerated. > > > > > > > I am fine with opt-in for every fs as long as we do not duplicate > > boilerplate code. > > An FS_ flag could be a simple way to opt-in for this generic behavior. > > > > > Looking around just a little there's some block-backed fses like fat > > > that have an f_fsid but no s_uuid. So if we give those s_uuid then it'll > > > mean that the f_fsid isn't generated based on the s_uuid. That should be > > > ok though and shouldn't matter to userspace. > > > > > > Afterwards we could probably lift the ext4 and xfs specific ioctls to > > > retrieve the s_uuid into a generic ioctl to allow userspace to get the > > > s_uuid. > > > > > > That's my thinking without having crawled to all possible corner > > > cases... Also needs documenting that s_uuid is not optional anymore and > > > explain the difference between pseudo and device-backed fses. I hope > > > that's not completely naive... > > > > > > > I don't think that the dichotomy of device-backed vs. pseudo is enough > > to describe the situation. > > > > I think what needs to be better documented and annotated is what type > > of fsnotify services can be expected to work on a given fs. > > You're looking at this solely from the angle of fanotify. In my earier > message I was looking at this as something that is generally useful. > Fanotify uses the s_uuid and f_fsid but they have value independent of > this. > Right. Overlayfs to name another internal user of s_uuid. and it would be useful for exportfs. Currently, userspace needs to workaround this by assigning fsid= manually in /etc/exports or by querying the uuid of the blockdev within libblkid. > > > > Jan has already introduced FS_DISALLOW_NOTIFY_PERM to opt-out > > of permission events (for procfs). > > That sounds like a decent solution. > > > > > Perhaps this could be generalized to s_type->fs_notify_supported_events > > or s_type->fs_notify_supported_features. > > > > For example, if an fs opts-in to FAN_REPORT_FID, then it gets an auto > > allocated s_uuid and f_fsid if it did not fill them in fill_super and in statfs > > This appears a layering violation to me. The s_uuid should be allocated > when the superblock is created just like tmpfs does it and not > retroactively/lazily when fanotify on the filesystem is reported. That's not what I meant. Extending the NFS file handles to fs object identification in a generic concept - it does not serve only fanotify. For fanotify, I deliberately chose to report object information that is available to userspace via other UAPIs. What I meant was: * filesystem can opt-in for exporting file id's to userspace, either with .fs_flags = FS_EXPORT_FID or with a new export_op method as Jan suggested, e.g.: export_op.encode_fid = generic_encode_fid64; With this opt-in, filesystem objects are more uniquely identified, so: * vfs enforces non empty s_uuid and non empty f_fsid * name_to_handle_at(....,AT_HANDLE_FID) returns a file handle for ID purpose that may (or may not) be useful in open_by_handle_at() even for fs that do not support NFS export * fanotify allows FAN_ERPORT_FID even with the absence of NFS export support I think we are all thinking more or less about the same solution. This is the time for me to stop writing emails and start writing patches so we have a better ground for discussion on the details... Thanks, Amir.