Re: [PATCH] overlayfs: port all superblock creation logging to fsopen logs

Aleksa Sarai <cyphar@xxxxxxxxxx> · Tue, 26 Nov 2024 18:25:08 +1100

On 2024-11-13, Karel Zak <kzak@xxxxxxxxxx> wrote:
> On Thu, Nov 07, 2024 at 02:09:19AM GMT, Aleksa Sarai wrote:
> > On 2024-11-06, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > > On Wed, Nov 6, 2024 at 12:00 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > > >
> > > > On Wed, Nov 6, 2024 at 10:59 AM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Wed, Nov 06, 2024 at 02:09:58PM +1100, Aleksa Sarai wrote:
> > > > > > overlayfs helpfully provides a lot of of information when setting up a
> > > > > > mount, but unfortunately when using the fsopen(2) API, a lot of this
> > > > > > information is mixed in with the general kernel log.
> > > > > >
> > > > > > In addition, some of the logs can become a source of spam if programs
> > > > > > are creating many internal overlayfs mounts (in runc we use an internal
> > > > > > overlayfs mount to protect the runc binary against container breakout
> > > > > > attacks like CVE-2019-5736, and xino_auto=on caused a lot of spam in
> > > > > > dmesg because we didn't explicitly disable xino[1]).
> > > > > >
> > > > > > By logging to the fs_context, userspace can get more accurate
> > > > > > information when using fsopen(2) and there is less dmesg spam for
> > > > > > systems where a lot of programs are using fsopen("overlay"). Legacy
> > > > > > mount(2) users will still see the same errors in dmesg as they did
> > > > > > before (though the prefix of the log messages will now be "overlay"
> > > > > > rather than "overlayfs").
> > > >
> > > > I am not sure about the level of risk in this format change.
> > > > Miklos, WDYT?
> > > >
> > > > > >
> > > > > > [1]: https://bbs.archlinux.org/viewtopic.php?pid=2206551
> > > > > >
> > > > > > Signed-off-by: Aleksa Sarai <cyphar@xxxxxxxxxx>
> > > > > > ---
> > > > >
> > > > > To me this sounds inherently useful! So I'm all for it.
> > > > >
> > > >
> > > > [CC: Karel]
> > > >
> > > > I am quite concerned about this.
> > > > I have a memory that Christian suggested to make this change back in
> > > > the original conversion to new mount API, but back then mount tool
> > > > did not print out the errors to users properly and even if it does
> > > > print out errors, some script could very well be ignoring them.
> > 
> > I think Christian mentioned this at LSF/MM (or maybe LPC), but it seems
> > that util-linux does provide the log information now in the case of
> > fsconfig(2) errors:
> > 
> > 	% strace -e fsopen,fsconfig mount -t overlay -o userxattr=str x /tmp/a
> > 	fsopen("overlay", FSOPEN_CLOEXEC)       = 3
> > 	fsconfig(3, FSCONFIG_SET_STRING, "source", "foo", 0) = 0
> > 	fsconfig(3, FSCONFIG_SET_STRING, "userxattr", "str", 0) = -1 EINVAL (Invalid argument)
> > 	mount: /tmp/a: fsconfig system call failed: overlay: Unexpected value for 'userxattr'.
> > 		   dmesg(1) may have more information after failed mount system call.
> > 
> > (Using the current HEAD of util-linux -- openSUSE's util-linux isn't
> > compiled with support for fsopen apparently.)
> 
> After failed mount-related syscalls, libmount reads messages prefixed
> with "e " from the file descriptor created by fdopen(). These messages
> are later printed by mount(8).
> 
> mount(8) or libmount does not read anything from kmesg.
> 
> > However, it doesn't output any of the info-level ancillary
> > information if there were no errors.
> 
> This is the expected default behavior. mount(8) does not print any
> additional information.
> 
> We can enhance libmount to read and print other messages on stdout if
> requested by the user. For example, the mount(8) command has a
> --verbose option that is currently only used by some /sbin/mount.<type>
> helpers, but not by mount(8) itself. We can improve this and use it in
> libmount to read and print info-level messages.
> 
> I can prepare a libmount/mount(8) patch for this.

This sounds like a good idea to me.

> > So there will definitely be some loss of
> > information for pr_* logs that don't cause an actual error (which is a
> > little unfortunate, since that is the exact dmesg spam that caused me to
> > write this patch).
> > 
> > I could take a look at sending a patch to get libmount to output that
> > information, but that won't help with the immediate issue, and this
> > doesn't help with the possible concern with some script that scrapes
> > dmesg. (Though I think it goes without saying that such scripts are kind
> > of broken by design -- since unprivileged users can create overlayfs
> > mounts and thus spam the kernel log with any message, there is no
> > practical way for a script to correctly get the right log information
> > without using the new mount API's logging facilities.)
> 
> > I can adjust this patch to only include the log+return-an-error cases,
> > but that doesn't really address your primary concern, I guess.
> > 
> > > > My strong feeling is that suppressing legacy errors to kmsg should be opt-in
> > > > via the new mount API and that it should not be the default for libmount.
> > > > IMO, it is certainly NOT enough that new mount API is used by userspace
> > > > as an indication for the kernel to suppress errors to kmsg.
>  
> For me, it seems like we are mixing two things together.
> 
> kmesg is a *log*, and tools like systemd read and save it. It is used
> for later issue debugging or by log analyzers. This means that all
> relevant information should be included.
> 
> The stderr/stdout output from tools such as mount(8) is simply
> feedback for users or scripts, and informational messages are just
> hints. They should not be considered a replacement for system logging
> facilities. The same applies to messages read from the new mount API;
> they should not be a replacement for system logs.
> 
> In my opinion, it is acceptable to suppress optional and unimportant
> messages and not save them into kmesg. However, all other relevant
> messages should be included regardless of the tool or API being used.

For warning or error messages, this makes sense -- though I think the
"least spammy" option would be that the logs are output to kmesg if
userspace closes the fscontext fd without reading the logs. That should
catch programs that miss log information, without affecting programs
that do read the logs (and do whatever they feel is appropriate with
them). That would be some reasonable default behaviour, and users could
explicitly opt into a verbose mode.

For informational or debug messages, I feel that the default should be
that we want to avoid outputting to kmesg when using the new mount API
since the information is non-critical and the only way of associating
the information is using the fscontext log. But if we had this "only log
on close if not read" behaviour, I think having the same behaviour for
all log messages would still work and would be more consistent.

> Additionally, it should be noted that mount(8)/libmount is only a
> userspace tool and is not necessary for mounting filesystems. The
> kernel should not rely on libmount behavior; there are other tools
> available such as busybox.

Sure, but by switching to the new mount API you are buying into
different behaviour for error logs (if only for the generic VFS ones),
regardless of what kind of program you are.

> > I can see an argument for some kind of MS_SILENT analogue for
> > fsconfig(), though it will make the spam problem worse until programs
> > migrate to setting this new flag.
>  
> Yes, the ideal solution would be to have mount options that can
> control this behavior. This would allow users to have control over it
> and save their settings to fstab, as well as keep it specific to the
> mount node.
> 
> > Also, as this is already an issue ever since libmount added support for
> > the new API (so since 2.39 I believe?), I think it would make just as
> > much sense for this flag to be opt-in -- so libmount could set the
> > "verbose" or "kmsglog" flag by default but most normal programs would
> > not get the spammy behaviour by default.
> 
> I prefer if the default behavior is defined by the kernel, rather than
> by userspace tools like libmount. If we were to automatically add any
> mount options through libmount, it would make it difficult to coexist
> with settings in fstab, etc. It's always better to have transparency
> and avoid any hidden factors in the process.

Right, my suggestion was that verbose should be opt-in precisely because
wanting to output to kmesg when using the new mount API is something
that only really makes sense for libmount and similar tools and so
should be opt-in rather than opt-out.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
Attachment:
signature.asc

Description: PGP signature