> > Hmm I guess "crash safety" is not well defined. > > You and I were talking about "system crash" and indeed, this was > > my only concern with kernel implementation of overlayfs watch. > > > > But with userspace HSM service, how can we guarantee that > > modifications did not happen while the service is down? > > > > I don't really have a good answer for this. > > Very good point! > > > Thinking out loud, we would somehow need to make the default > > permission deny for all modifications, maybe through some mount > > property (e.g. MOUNT_ATTR_PROT_READ), causing the pre-write > > hooks to default to EROFS if there is no "vfs filter" mount mark. > > > > Then it will be possible to expose a "safe" mount to users, where > > modifications can never go unnoticed even when HSM service > > crashes. > > Yeah, something like this. Although the bootstrap of this during mount may > be a bit challenging. But maybe not. > I don't think so. As I wrote on several occasions, some of the current HSMs are implemented as FUSE filesystems and require mount. As I imagine an HSM system (and as our in-house system works) there is a filesystem containing populated and unpopulated files that admin can access without any filters and there is a mount that is exposed to users where the filtering and on-demand populate happens. I am less worried about bringup. My HttpDirFS POC already does mount move of a marked mount on startup. My concern was how to handle dying fanotify group safely. > Also I'm thinking about other usecases - for HSM I agree we essentially > need to take the FS down if the userspace counterpart is not working. What > about other persistent change log usecases? Do we mandate that there is > only one "persistent change log" daemon in the system (or per filesystem?) > and that must be running or we take the filesystem down? And anybody who > wants reliable notifications needs to consume service of this daemon? Yes, I envision a single systemd-fsmonitor daemon (or instance per sb) that can handle subscribing to changes on subtree and can deal with the permission of dispatching events on subtrees. To answer your question, I think the bare minimum that we need to provide is a property of the mount (probably an event mask) that requires at least one active fanotify vfs filter to allow certain permission events to go through. I think it would make sense to allow a single FAN_CLASS_VFS_FILTER group mark per sb and one per mount. If use cases that require more vfs filters per sb/mount arise, we can revisit that restriction later. Thanks, Amir.