[PATCH v2 0/3] Reduce impact of overlayfs backing files fake path

Amir Goldstein <amir73il@xxxxxxxxx> · Sat, 7 Oct 2023 11:44:30 +0300

Hi Christian,

This is another step in the multi step plan to reduce the vulnerability
of filesystems code to overlayfs backing files whose f_path and f_inode
are not on the same sb.

History:
--------
When overlayfs was first merged, overlayfs files of regular files and
directories, the ones that are installed in file table, had a "fake" path,
namely, f_path is the overlayfs path and f_inode is the "real" inode on
the underlying filesystem.

At that time, all filesystem syscalls, generic vfs code and any filesystem
code (among the fs that are supported as overlayfs underlying layers) had
to be aware of the possibility of a file with "fake" path and had to use
the helper file_dentry() to get the "real" dentry and the helper
locks_inode() to get to the overlayfs inode.
At that time, there was no way to get the "real" path.

This situation was fragile because many filesystem developers were not
aware of having to deal with file with "fake" path.
Admittedly, it wasn't very well documented either.

Backing files:
--------------
The first big step to reduce the impact of files with fake path was in
v4.19 that introduced d1d04ef8572b ("ovl: stack file ops").  Since
v4.19, f_inode of the overlayfs files that are installed in file table
are overlayfs inodes, so those files are no longer odd.

As a result of this change, a lot of syscalls code, which take fd as an
argument, is no longer exposed to files with "fake" path and the helper
file_locks() was no longer needed.

However, since v4.19, overlayfs uses internal backing files with "fake"
path, so generic vfs code that overlayfs calls with the backing files and
filesystem code still needs to be aware of files with "fake" path.

The one place where the internal backing files are "exposed" to users
is when users mmap overlayfs files, the backing file (and not the
overlayfs file) is stored in vm_file. The reason that overlayfs backing
files use a "fake" path is so that /proc/<pid>/maps will disaply the
overlayfs file path to users.

Recently:
---------
In v6.5, we took another small step by introducing of the backing_file
container and the file_real_path() helper.  This change allowed vfs and
filesystem code to get the "real" path of an overlayfs backing file.
With this change, we were able to make fsnotify work correctly and
report events on the "real" filesystem objects that were accessed via
overlayfs.

This method works fine, but it still leaves the vfs vulnerable to new
code that is not aware of files with fake path.  A recent example is
commit db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the i_version").
This commit uses direct referencing to f_path in IMA code that otherwise
uses file_inode() and file_dentry() to reference the filesystem objects
that it is measuring.

Coming up:
----------
This patch set actually implements my initial proposal [1] that was
proposed for v6.5 - instead of having filesystem code opt-in to get the
"real" path, have generic code opt-in for the "fake" path in the few
places that it is needed.

It is possible that I missed a few places that need opt-in for "fake"
path, but in most likelihood, a missed opt-in to "fake" path would just
result in printing the "real" path to user and if %pD is used for
printing, that will probably make no difference anyway.

Is it far more likely that new filesystems code that does not use the
file_dentry() and file_real_path() helpers will end up causing crashes
or averting LSM/audit rules if we keep the "fake" path exposed by default.

Future:
-------
This change already makes file_dentry() moot, but for now we did not
change this helper just added a WARN_ON() in ovl_d_real() to catch if we
have made any wrong assumptions.

After the dust settles on this change, we can make file_dentry() a plain
accessor and we can drop the inode argument to ->d_real().

Testing:
--------
As usual, I ran fstests with overlayfs over xfs and all the LTS tests
that test fsnotify + overlayfs.

I have limited testing ability for audit/IMA/LSM modules, so we will
have to rely on other people and bots testing of linux-next.
Syzbot has been known to be very obsessive about reporting bugs of
overlayfs + IMA.

Merging:
--------
The branch that I tested [2] is based on both the stable vfs.mount.write
branch and vfs.misc of the moment.

If you agree to stage these patches in vfs tree, they would need
to be based on both vfs.mount.write and the file_free_rcu patches.
So perhaps split the file_free_rcu patches out of vfs.misc and into a
vfs.file topic branch that is based on the stable vfs.mount.write branch
and add my patches on top?

I don't need vfs.file to be stable, I understand that the file_free_rcu()
patches may still change, but since I would like to test overlayfs-next
together with the "fake" path patches, it would be nice if the vfs.file
branch would be more stable than vfs.misc usually is.

Thanks,
Amir.

Changes since v1:
- backing_file (fake_file rebranded) container already merged
- fsnotify support for backing files already merged
- Take mnt_writers refcount on the "real" path
- Rename file_fake_path() => file_user_path()
- No need to use file_user_path() for exe_file path access
- No need to use file_user_path() in audit/LSM code
- Added file_user_path() in some tracing/debugging code

[1] https://lore.kernel.org/r/20230609073239.957184-1-amir73il@xxxxxxxxx/
[2] https://github.com/amir73il/linux/commits/ovl_fake_path

Amir Goldstein (3):
  fs: get mnt_writers count for an open backing file's real path
  fs: create helper file_user_path() for user displayed mapped file path
  fs: store real path instead of fake path in backing file f_path

 arch/arc/kernel/troubleshoot.c |  3 +-
 fs/file_table.c                | 12 ++++----
 fs/internal.h                  | 15 ++++++++--
 fs/open.c                      | 55 +++++++++++++++++++++++-----------
 fs/overlayfs/super.c           | 11 ++++++-
 fs/proc/base.c                 |  2 +-
 fs/proc/nommu.c                |  2 +-
 fs/proc/task_mmu.c             |  4 +--
 fs/proc/task_nommu.c           |  2 +-
 include/linux/fs.h             | 27 ++++++++++-------
 include/linux/fsnotify.h       |  3 +-
 kernel/trace/trace_output.c    |  2 +-
 12 files changed, 92 insertions(+), 46 deletions(-)

-- 
2.34.1