Re: [RFC][PATCH] fanotify: introduce filesystem view mark

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon 17-05-21 15:45:29, Amir Goldstein wrote:
> On Mon, May 17, 2021 at 12:09 PM Jan Kara <jack@xxxxxxx> wrote:
> >
> > On Sat 15-05-21 17:28:27, Amir Goldstein wrote:
> > > On Fri, May 14, 2021 at 4:56 PM Christian Brauner
> > > <christian.brauner@xxxxxxxxxx> wrote:
> > > > > for changes with idmap-filtered mark, then it won't see notification for
> > > > > those changes because A presumably runs in a different namespace than B, am
> > > > > I imagining this right? So mark which filters events based on namespace of
> > > > > the originating process won't be usable for such usecase AFAICT.
> > > >
> > > > Idmap filtered marks won't cover that use-case as envisioned now. Though
> > > > I'm not sure they really need to as the semantics are related to mount
> > > > marks.
> > >
> > > We really need to refer to those as filesystem marks. They are definitely
> > > NOT mount marks. We are trying to design a better API that will not share
> > > as many flaws with mount marks...
> >
> > I agree. I was pondering about this usecase exactly because the problem with
> > changes done through mount A and visible through mount B which didn't get
> > a notification were source of complaints about fanotify in the past and the
> > reason why you came up with filesystem marks.
> >
> > > > A mount mark would allow you to receive events based on the
> > > > originating mount. If two mounts A and B are separate but expose the
> > > > same files you wouldn't see events caused by B if you're watching A.
> > > > Similarly you would only see events from mounts that have been delegated
> > > > to you through the idmapped userns. I find this acceptable especially if
> > > > clearly documented.
> > > >
> > >
> > > The way I see it, we should delegate all the decisions over to userspace,
> > > but I agree that the current "simple" proposal may not provide a good
> > > enough answer to the case of a subtree that is shared with the host.
> > >
> > > IMO, it should be a container manager decision whether changes done by
> > > the host are:
> > > a) Not visible to containerized application
> > > b) Watched in host via recursive inode watches
> > > c) Watched in host by filesystem mark filtered in userspace
> > > d) Watched in host by an "noop" idmapped mount in host, through
> > >      which all relevant apps in host access the shared folder
> > >
> > > We can later provide the option of "subtree filtered filesystem mark"
> > > which can be choice (e). It will incur performance overhead on the system
> > > that is higher than option (d) but lower than option (c).
> >
> > But won't b) and c) require the container manager to inject events into the
> > event stream observed by the containerized fanotify user? Because in both
> > these cases the manager needs to consume generated events and decide what
> > to do with them.
> >
> 
> With (b) manager does not need to inject events.
> The manager intercepts fanotify_init() and returns an actual fantify group fd
> in the requesting process fd table.
> 
> Later, when manager intercepts fanotify_mark() with idmapped mark
> request, manager can take care of setting up the recursive inode watches,
> but the requesting process will get the events, because it has a clone of
> the fanotify group fd.

Well, but for recursive inode watches to function, you also have to process
the stream of events to detect created dirs etc. Also you may have to
remove (e.g. directory) events the original user didn't ask for...

> With (c), I guess the intercepted fanotify_init() can return an open pipe
> and proxy the stream of events read from the actual fanotify fd filtering
> out the events.

Yes, that's what I thought about. But it isn't 100% transparent (e.g.
fdinfo will be different).

> I hope we can provide some form of kernel subtree filtering so
> userspace will not need to resort to this sort of practice.

I hope as well :)

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux