Re: [RFC][PATCH] fanotify: introduce filesystem view mark

Christian Brauner <christian.brauner@xxxxxxxxxx> · Wed, 19 May 2021 11:31:56 +0200

On Tue, May 18, 2021 at 07:02:28PM +0300, Amir Goldstein wrote:
> > > > > > 2. data sharing among containers or among the host and containers etc.
> > > > > > The most common use-case is to share data from the host with the
> > > > > > container such as a download folder or the Linux folder on ChromeOS.
> > > > > > Most container managers will simly re-use the container's userns for
> > > > > > that too. More complex cases arise where data is shared between
> > > > > > containers with different idmappings then often a separate userns will
> > > > > > have to be used.
> > > > >
> > > > > OK, but if say on ChromeOS you copy something to the Linux folder by app A
> > > > > (say file manager) and containerized app B (say browser) watches that mount
> > > >
> > > > For ChromeOS it is currently somewhat simple since they currently only
> > > > allow a single container by default. So everytime you start an app in
> > > > the container it's the same app so they all write to the Linux Files
> > > > folder through the same container. (I'm glossing over a range of details
> > > > but that's not really relevant to the general spirit of the example.).
> > > >
> > > >
> > > > > for changes with idmap-filtered mark, then it won't see notification for
> > > > > those changes because A presumably runs in a different namespace than B, am
> > > > > I imagining this right? So mark which filters events based on namespace of
> > > > > the originating process won't be usable for such usecase AFAICT.
> > > >
> > > > Idmap filtered marks won't cover that use-case as envisioned now. Though
> > > > I'm not sure they really need to as the semantics are related to mount
> > > > marks.
> > >
> > > We really need to refer to those as filesystem marks. They are definitely
> > > NOT mount marks. We are trying to design a better API that will not share
> > > as many flaws with mount marks...
> > >
> > > > A mount mark would allow you to receive events based on the
> > > > originating mount. If two mounts A and B are separate but expose the
> > > > same files you wouldn't see events caused by B if you're watching A.
> > > > Similarly you would only see events from mounts that have been delegated
> > > > to you through the idmapped userns. I find this acceptable especially if
> > > > clearly documented.
> > > >
> > >
> > > The way I see it, we should delegate all the decisions over to userspace,
> > > but I agree that the current "simple" proposal may not provide a good
> > > enough answer to the case of a subtree that is shared with the host.
> >
> > I was focussed on what happens if you set an idmapped filtered mark for
> > a container for a set of files that is exposed to another container via
> > another idmapped mount. And it seemed to me that it was ok if the
> > container A doesn't see events from container B.
> >
> > You seem to be looking at this from the host's perspective right now
> > which is interesting as well.
> >
> > >
> > > IMO, it should be a container manager decision whether changes done by
> > > the host are:
> > > a) Not visible to containerized application
> >
> > Yes, that seems ok.
> >
> > > b) Watched in host via recursive inode watches
> > > c) Watched in host by filesystem mark filtered in userspace
> > > d) Watched in host by an "noop" idmapped mount in host, through
> > >      which all relevant apps in host access the shared folder
> >
> > So b)-d) are concerned with the host getting notifcations for changes
> > done from any container that uses a given set of files possibly through
> > different mounts.
> >
> 
> My perception was that container manager knows about all the idmapped
> mounts that share the same folder, so when container A requests to watch

Yes, the container manager would know this.

> the shared folder, container manager sets idmapped marks on *all* the
> idmapped mounts and when a new container is started which also maps
> the shared folder, idmapped marks are added to *all* the fanotify groups
> that the container manager currently maintains, which are interested in the
> shared folder.

Ah, that part wasn't spelled out in the previous mail. Yes, that would
work.

> 
> With (d) this can still be the model.
> With (c) it still makes sense to save filtering cycles in userspace in case
> events originate inside containers.
> With (b) there doesn't seem to be any need for the idmapped filtered marks
> at all.

Right, I wasn't sure at first whether you listed this as mutually
exclusive implementations. But I see now that these are choices the
manager can make about how to implement those watches. Thanks for the
clarification.

Christian