On Mon, Feb 19, 2024 at 1:01 PM Jan Kara <jack@xxxxxxx> wrote: > > On Thu 15-02-24 17:40:07, Amir Goldstein wrote: > > > > Last time we discussed this the conclusion was an API of a group-less > > > > default mask, for example: > > > > > > > > 1. fanotify_mark(FAN_GROUP_DEFAULT, > > > > FAN_MARK_ADD | FAN_MARK_MOUNT, > > > > FAN_PRE_ACCESS, AT_FDCWD, path); > > > > 2. this returns -EPERM for access until some group handles FAN_PRE_ACCESS > > > > 3. then HSM is started and subscribes to FAN_PRE_ACCESS > > > > 4. and then the mount is moved or bind mounted into a path exported to users > > > > > > Yes, this was the process I was talking about. > > > > > > > It is a simple solution that should be easy to implement. > > > > But it does not involve "register the HSM app with the filesystem", > > > > unless you mean that a process that opens an HSM group > > > > (FAN_REPORT_FID|FAN_CLASS_PRE_CONTENT) should automatically > > > > be given FMODE_NONOTIFY files? > > > > > > Two ideas: What you describe above seems like what the new mount API was > > > intended for? What if we introduced something like an "hsm" mount option > > > which would basically enable calling into pre-content event handlers > > > > I like that. > > I forgot that with my suggestion we'd need a path to setup > > the default mask. > > > > > (for sb without this flag handlers wouldn't be called and you cannot place > > > pre-content marks on such sb). > > > > IMO, that limitation (i.e. inside brackets) is too restrictive. > > In many cases, the user running HSM may not have control over the > > mount of the filesystem (inside containers?). > > It is true that HSM without anti-crash protection is less reliable, > > but I think that it is still useful enough that users will want the > > option to run it (?). > > > > Think of my HttpDirFS demo - it's just a simple lazy mirroring > > of a website. Even with low reliability I think it is useful (?). > > Yeah, ok, makes sense. But for such "unpriviledged" usecases we don't have > a deadlock-free way to fill in the file contents because that requires a > special mountpoint? > True, unless we also keep the FMODE_NONOTIFY event->fd for the simple cases. I'll need to think about this some more. > > > These handlers would return EACCESS unless > > > there's somebody handling events and returning something else. > > > > > > You could then do: > > > > > > fan_fd = fanotify_init() > > > ffd = fsopen() > > > fsconfig(ffd, FSCONFIG_SET_STRING, "source", device, 0) > > > fsconfig(ffd, FSCONFIG_SET_FLAG, "hsm", NULL, 0) > > > rootfd = fsconfig(ffd, FSCONFIG_CMD_CREATE, NULL, NULL, 0) > > > fanotify_mark(fan_fd, FAN_MARK_ADD, ... , rootfd, NULL) > > > <now you can move the superblock into the mount hierarchy> > > > > Not too bad. > > I think that "hsm_deny_mask=" mount options would give more flexibility, > > but I could be convinced otherwise. > > > > It's probably not a great idea to be running two different HSMs on the same > > fs anyway, but if user has an old HSM version installed that handles only > > pre-content events, I don't think that we want this old version if it happens > > to be run by mistake, to allow for unsupervised create,rename,delete if the > > admin wanted to atomically mount a fs that SHOULD be supervised by a > > v2 HSM that knows how to handle pre-path events. > > > > IOW, and "HSM bit" on sb is too broad IMO. > > OK. So "hsm_deny_mask=" would esentially express events that we require HSM > to handle, the rest would just be accepted by default. That makes sense. Yes. > The only thing I kind of dislike is that this ties fanotify API with mount > API. So perhaps hsm_deny_mask should be specified as a string? Like > preaccess,premodify,prelookup,... and transformed into a bitmask only > inside the kernel? It gives us more maneuvering space for the future. > Urgh. I see what you are saying, but this seems so ugly to me. I have a strong feeling that we are trying to reinvent something and that we are going to reinvent it badly. I need to look for precedents, maybe in other OS. I believe that in Windows, there is an API to register as a Cloud Engine Provider, so there is probably a way to have multiple HSMs working on different sections of the filesystem in some reliable crash safe manner. > > > This would elegantly solve the "what if HSM handler dies" problem as well > > > as cleanly handle the setup. And we don't have to come up with a concept of > > > "default mask". > > > > We can still have a mask, it's just about the API to set it up. > > > > > Now we still have the problem how to fill in the filesystem > > > on pre-content event without deadlocking. As I was thinking about it the > > > most elegant solution would IMHO be if the HSM handler could have a private > > > mount flagged so that HSM is excluded from there (or it could place ignore > > > mark on this mount for HSM events). > > > > My HttpDirFS demo does it the other way around - HSM uses a mount > > without a mark mount - but ignore mark works too. > > Yes, the HSM handler is free to setup whatever works for it. I was just > thinking to make sure there is at least one sane way how to do it :) > Yeh, we need to write "best practice" guidelines. > > > I think we've discarded similar ideas > > > in the past because this is problematic with directory pre-content events > > > because security hooks don't get the mountpoint. But what if we used > > > security_path_* hooks for directory pre-content events? > > > > No need for security_path_ * hooks. > > In my POC, the pre-path hooks have the path argument. > > For people who are not familiar with the term, here is man page draft > > for "pre-path" events: > > https://github.com/amir73il/man-pages/commits/fan_pre_path/ > > > > This is an out of date branch from the time that I called them > > FAN_PRE_{CREATE,DELETE,MOVE_*} events: > > https://github.com/amir73il/linux/commit/29c60e4db3068ff2cd7b2c5a73108afb2c19b868 > > > > They are implemented by replacing the mnt_want_write() calls > > with mnt_want_write_{path,parent,parents}() calls. > > > > This was done to make sure that they take the sb write srcu and call > > the pre-path hook before taking sb writers freeze protection. > > Ok, so AFAIU you agree we don't need to rely on FMODE_NONOTIFY for HSM and > can just use access through dedicated mount for filling in the filesystem? > It seems like a decent and simple way to avoid difficult questions, so I will try to start with that... whenever I manage to context switch back ;) Thanks, Amir.