On Wed 20-09-23 11:26:38, Amir Goldstein wrote: > On Mon, Apr 17, 2023 at 7:27 PM Jan Kara <jack@xxxxxxx> wrote: > > > > > My motivation is to close functional gaps between fanotify and inotify. > > > > > > > > > > One of the largest gaps right now is that FAN_REPORT_FID is limited > > > > > to a subset of local filesystems. > > > > > > > > > > The idea is to report fid's that are "good enough" and that there > > > > > is no need to require that fid's can be used by open_by_handle_at() > > > > > because that is a non-requirement for most use cases, unpriv listener > > > > > in particular. > > > > > > > > OK. I'd note that if you report only inode number, you are prone to the > > > > problem that some inode gets freed (file deleted) and then reallocated (new > > > > file created) and the resulting identifier is the same. It can be > > > > problematic for a listener to detect these cases and deal with them. > > > > Inotify does not have this problem at least for some cases because 'wd' > > > > uniquely identifies the marked inode. For other cases (like watching dirs) > > > > inotify has similar sort of problems. I'm muttering over this because in > > > > theory filesystems not having i_generation counter on disk could approach > > > > the problem in a similar way as FAT and then we could just use > > > > FILEID_INO64_GEN for the file handle. > > > > > > Yes, of course we could. > > > The problem with that is that user space needs to be able to query the fid > > > regardless of fanotify. > > > > > > The fanotify equivalent of wd is the answer to that query. > > > > > > If any fs would export i_generation via statx, then FILEID_INO64_GEN > > > would have been my choice. > > > > One problem with making up i_generation (like FAT does it) is that when > > inode gets reclaimed and then refetched from the disk FILEID_INO64_GEN will > > change because it's going to have different i_generation. For NFS this is > > annoying but I suppose it mostly does not happen since client's accesses > > tend to keep the inode in memory. For fanotify it could be more likely to > > happen if watching say the whole filesystem and I suppose the watching > > application will get confused by this. So I'm not convinced faking > > i_generation is a good thing to do. But still I want to brainstorm a bit > > about it :) > > > > > But if we are going to change some other API for that, I would not change > > > statx(), I would change name_to_handle_at(...., AT_HANDLE_FID) > > > > > > This AT_ flag would relax this check in name_to_handle_at(): > > > > > > /* > > > * We need to make sure whether the file system > > > * support decoding of the file handle > > > */ > > > if (!path->dentry->d_sb->s_export_op || > > > !path->dentry->d_sb->s_export_op->fh_to_dentry) > > > return -EOPNOTSUPP; > > > > > > And allow the call to proceed to the default export_encode_fh() implementation. > > > Alas, the default implementation encodes FILEID_INO32_GEN. > > > > > > I think we can get away with a default implementation for FILEID_INO64_GEN > > > as long as the former (INO32) is used for fs with export ops without ->encode_fh > > > (e.g. ext*) and the latter (INO64) is used for fs with no export ops. > > > > These default calls seem a bit too subtle to me. I'd rather add explicit > > handlers to filesystems that really want FILEID_INO32_GEN encoding and then > > have a fallback function for filesystems not having s_export_op at all. > > > > But otherwise the proposal to make name_to_handle_at() work even for > > filesystems not exportable through NFS makes sense to me. But I guess we > > need some buy-in from VFS maintainers for this. > > > > Hi Jan, > > I seem to have dropped the ball on this after implementing AT_HANDLE_FID. > It was step one in a larger plan. No problem, I forgot about this as well :) > Christian, Jeff, > > Do you have an objection to this plan: > 1. Convert all "legacy" FILEID_INO32_GEN fs with non-empty > s_export_op and no explicit ->encode_fh() to use an explicit > generic_encode_ino32_gen_fh() > 2. Relax requirement of non-empty s_export_op for AT_HANDLE_FID > to support encoding a (non-NFS) file id on all fs > 3. For fs with empty s_export_op, allow fallback of AT_HANDLE_FID > in exportfs_encode_inode_fh() to encode FILEID_INO64_GEN > > > > > > Also I have noticed your workaround with using st_dev for fsid. As I've > > > > checked, there are actually very few filesystems that don't set fsid these > > > > days. So maybe we could just get away with still refusing to report on > > > > filesystems without fsid and possibly fixup filesystems which don't set > > > > fsid yet and are used enough so that users complain? > > > > > > I started going down this path to close the gap with inotify. > > > inotify is capable of watching all fs including pseudo fs, so I would > > > like to have this feature parity. > > > > Well, but with pseudo filesystems (similarly as with FUSE) the notification > > was always unreliable. As in: some cases worked but others did not. I'm not > > sure that is something we should try to replicate :) > > > > So still I'd be interested to know which filesystems we are exactly > > interested to support and whether we are not better off to explicitly add > > fsid support to them like we did for tmpfs. > > > > Since this email, kernfs derivative fs gained fsid as well. > Quoting your audit of remaining fs from another thread: > > > ...As far as I remember > > fanotify should be now able to handle anything that provides f_fsid in its > > statfs(2) call. And as I'm checking filesystems not setting fsid currently are: > > > > afs, coda, nfs - networking filesystems where inotify and fanotify have > > dubious value anyway > > Be that as it may, there may be users that use inotify on network fs > and it even makes a lot of sense in controlled environments with > single NFS client per NFS export (e.g. home dirs), so I think we will > need to support those fs as well. Fair enough. > Maybe the wise thing to do is to opt-in to monitor those fs after all? > Maybe with explicit opt-in to watch a single fs, fanotify group will > limit itself to marks on a specific sb and then a null fsid won't matter? We have virtual filesystems with all sorts of missing or weird notification functionality and we don't flag that in any way. So making a special flag for network filesystems seems a bit arbitrary. I'd just make them provide fsid and be done with it. > > configfs, debugfs, devpts, efivarfs, hugetlbfs, openpromfs, proc, pstore, > > ramfs, sysfs, tracefs - virtual filesystems where fsnotify functionality is > > quite limited. But some special cases could be useful. Adding fsid support > > is the same amount of trouble as for kernfs - a few LOC. In fact, we > > could perhaps add a fstype flag to indicate that this is a filesystem > > without persistent identification and so uuid should be autogenerated on > > mount (likely in alloc_super()) and f_fsid generated from sb->s_uuid. > > This way we could handle all these filesystems with trivial amount of > > effort. > > > > Christian, > > I recall that you may have had reservations on initializing s_uuid > and f_fsid in vfs code? > Does an opt-in fstype flag address your concerns? > Will you be ok with doing the tmpfs/kernfs trick for every fs > that opted-in with fstype flag in generic vfs code? > > > freevxfs - the only real filesystem without f_fsid. Trivial to handle one > > way or the other. > > > > Last but not least, btrfs subvolumes. > They do have an fsid, but it is different from the sb fsid, > so we disallow (even inode) fanotify marks. > > I am not sure how to solve this one, > but if we choose to implement the opt-in fanotify flag for > "watch single fs", we can make this problem go away, along > with the problem of network fs fsid and other odd fs that we > do not want to have to deal with. > > On top of everything, it is a fast solution and it doesn't > involve vfs and changing any fs at all. Yeah, right, forgot about this one. Thanks for reminding me. But this is mostly a kernel internal implementation issue and doesn't seem to be a principial problem so I'd prefer not to complicate the uAPI for this. We could for example mandate a special super_operation for fetching fsid for a dentry for filesystems which don't have uniform fsids over the whole filesystem (i.e., btrfs) and call this when generating event for such filesystems. Or am I missing some other complication? > > > If we can get away with fallback to s_dev as fsid in vfs_statfs() > > > I have no problem with that, but just to point out - functionally > > > it is equivalent to do this fallback in userspace library as the > > > fanotify_get_fid() LTP helper does. > > > > Yes, userspace can workaround this but I was more thinking about avoiding > > adding these workarounds into fanotify in kernel *and* to userspace. > > > > > > > I chose a rather generic name for the flag to opt-in for "good enough" > > > > > fid's. At first, I was going to make those fid's self describing the > > > > > fact that they are not NFS file handles, but in the name of simplicity > > > > > to the API, I decided that this is not needed. > > > > > > > > I'd like to discuss a bit about the meaning of the flag. On the first look > > > > it is a bit strange to have a flag that says "give me a fh, if you don't > > > > have it, give me ino". It would seem cleaner to have "give me fh" kind of > > > > interface (FAN_REPORT_FID) and "give me ino" kind of interface (new > > > > FAN_REPORT_* flag). I suspect you've chosen the more complex meaning > > > > because you want to allow a usecase where watches of filesystems which > > > > don't support filehandles are mixed with watches of filesystems which do > > > > support filehandles in one notification group and getting filehandles is > > > > actually prefered over getting just inode numbers? Do you see real benefit > > > > in getting file handles when userspace has to implement fallback for > > > > getting just inode numbers anyway? > > > > > > > > > > Yes, there is a benefit, because a real fhandle has no-reuse guarantee. > > > > > > Even if we implement the kernel fallback to FILEID_INO64_GEN, it does > > > not serve as a statement from the filesystem that i_generation is useful > > > and in fact, i_generation will often be zero in simple fs and ino will be > > > reusable. > > > > > > Also, I wanted to have a design where a given fs/object always returns > > > the same FID regardless of the init flags. > > > > > > Your question implies that if > > > "userspace has to implement fallback for getting just inode numbers", > > > then it doesn't matter if we report fhandle or inode, but it is not accurate. > > > > > > The fanotify_get_fid() LTP helper always gets a consistent FID for a > > > given fs/object. You do not need to feed it the fanotify init flags to > > > provide a consistent answer. > > > > > > For all the reasons above, I think that a "give me ino'' flag is not useful. > > > IMO, the flag just needs better marketing. > > > This is a "I do not need/intend to open_by_handle flag". > > > Suggestions for a better name are welcome. > > > > I see, yes, these reasons make sense. > > > > > For all I care, we do not need to add an opt-in flag at all. > > > We could simply start to support fs that were not supported before. > > > This sort of API change is very common and acceptable. > > > > > > There is no risk if the user tries to call open_by_handle_at() with the > > > fanotify encoded FID, because in this case the fs is guaranteed to > > > return ESTALE, because fs does not support file handles. > > > > > > This is especially true, if we can get away with seamless change > > > of behavior for vfs_statfs(), because that seamless change would > > > cause FAN_REPORT_FID to start working on fs like fuse that > > > support file handles and have zero fsid. > > > > Yeah. Actually I like the idea of a seamless change to start reporting fsid > > and also to start reporting "fake" handles. In the past we've already > > enabled tmpfs like this... > > > > I am now leaning towards a combination of: > 1. Seamless change of behavior for vfs_statfs() and > name_to_handle_at(..., AT_HANDLE_FID) for the simple cases > using an opt-in fstype flag Ack. > AND > 2. Simple interim fallback for other fs with an opt-in fanotify flag (*) I'm not sold on the special flag yet (see above) ;). Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR