On Thu, Jul 18, 2024 at 04:16:45PM +0200, Roberto Sassu wrote: > On Sat, 2024-07-06 at 16:56 +0200, Mickaël Salaün wrote: > > On Fri, Jul 05, 2024 at 02:44:03PM -0700, Kees Cook wrote: > > > On Fri, Jul 05, 2024 at 07:54:16PM +0200, Mickaël Salaün wrote: > > > > On Thu, Jul 04, 2024 at 05:18:04PM -0700, Kees Cook wrote: > > > > > On Thu, Jul 04, 2024 at 09:01:34PM +0200, Mickaël Salaün wrote: > > > > > > Such a secure environment can be achieved with an appropriate access > > > > > > control policy (e.g. mount's noexec option, file access rights, LSM > > > > > > configuration) and an enlighten ld.so checking that libraries are > > > > > > allowed for execution e.g., to protect against illegitimate use of > > > > > > LD_PRELOAD. > > > > > > > > > > > > Scripts may need some changes to deal with untrusted data (e.g. stdin, > > > > > > environment variables), but that is outside the scope of the kernel. > > > > > > > > > > If the threat model includes an attacker sitting at a shell prompt, we > > > > > need to be very careful about how process perform enforcement. E.g. even > > > > > on a locked down system, if an attacker has access to LD_PRELOAD or a > > > > > > > > LD_PRELOAD should be OK once ld.so will be patched to check the > > > > libraries. We can still imagine a debug library used to bypass security > > > > checks, but in this case the issue would be that this library is > > > > executable in the first place. > > > > > > Ah yes, that's fair: the shell would discover the malicious library > > > while using AT_CHECK during resolution of the LD_PRELOAD. > > > > That's the idea, but it would be checked by ld.so, not the shell. > > > > > > > > > > seccomp wrapper (which you both mention here), it would be possible to > > > > > run commands where the resulting process is tricked into thinking it > > > > > doesn't have the bits set. > > > > > > > > As explained in the UAPI comments, all parent processes need to be > > > > trusted. This meeans that their code is trusted, their seccomp filters > > > > are trusted, and that they are patched, if needed, to check file > > > > executability. > > > > > > But we have launchers that apply arbitrary seccomp policy, e.g. minijail > > > on Chrome OS, or even systemd on regular distros. In theory, this should > > > be handled via other ACLs. > > > > Processes running with untrusted seccomp filter should be considered > > untrusted. It would then make sense for these seccomp filters/programs > > to be considered executable code, and then for minijail and systemd to > > check them with AT_CHECK (according to the securebits policy). > > > > > > > > > > But this would be exactly true for calling execveat(): LD_PRELOAD or > > > > > seccomp policy could have it just return 0. > > > > > > > > If an attacker is allowed/able to load an arbitrary seccomp filter on a > > > > process, we cannot trust this process. > > > > > > > > > > > > > > While I like AT_CHECK, I do wonder if it's better to do the checks via > > > > > open(), as was originally designed with O_MAYEXEC. Because then > > > > > enforcement is gated by the kernel -- the process does not get a file > > > > > descriptor _at all_, no matter what LD_PRELOAD or seccomp tricks it into > > > > > doing. > > > > > > > > Being able to check a path name or a file descriptor (with the same > > > > syscall) is more flexible and cover more use cases. > > > > > > If flexibility costs us reliability, I think that flexibility is not > > > a benefit. > > > > Well, it's a matter of letting user space do what they think is best, > > and I think there are legitimate and safe uses of path names, even if I > > agree that this should not be used in most use cases. Would we want > > faccessat2(2) to only take file descriptor as argument and not file > > path? I don't think so but I'd defer to the VFS maintainers. > > > > Christian, Al, Linus? > > > > Steve, could you share a use case with file paths? > > > > > > > > > The execveat(2) > > > > interface, including current and future flags, is dedicated to file > > > > execution. I then think that using execveat(2) for this kind of check > > > > makes more sense, and will easily evolve with this syscall. > > > > > > Yeah, I do recognize that is feels much more natural, but I remain > > > unhappy about how difficult it will become to audit a system for safety > > > when the check is strictly per-process opt-in, and not enforced by the > > > kernel for a given process tree. But, I think this may have always been > > > a fiction in my mind. :) > > > > Hmm, I'm not sure to follow. Securebits are inherited, so process tree. > > And we need the parent processes to be trusted anyway. > > > > > > > > > > And this thinking also applies to faccessat() too: if a process can be > > > > > tricked into thinking the access check passed, it'll happily interpret > > > > > whatever. :( But not being able to open the fd _at all_ when O_MAYEXEC > > > > > is being checked seems substantially safer to me... > > > > > > > > If attackers can filter execveat(2), they can also filter open(2) and > > > > any other syscalls. In all cases, that would mean an issue in the > > > > security policy. > > > > > > Hm, as in, make a separate call to open(2) without O_MAYEXEC, and pass > > > that fd back to the filtered open(2) that did have O_MAYEXEC. Yes, true. > > > > > > I guess it does become morally equivalent. > > > > > > Okay. Well, let me ask about usability. Right now, a process will need > > > to do: > > > > > > - should I use AT_CHECK? (check secbit) > > > - if yes: perform execveat(AT_CHECK) > > > > > > Why not leave the secbit test up to the kernel, and then the program can > > > just unconditionally call execveat(AT_CHECK)? > > > > That was kind of the approach of the previous patch series and Linus > > wanted the new interface to follow the kernel semantic. Enforcing this > > kind of restriction will always be the duty of user space anyway, so I > > think it's simpler (i.e. no mix of policy definition, access check, and > > policy enforcement, but a standalone execveat feature), more flexible, > > and it fully delegates the policy enforcement to user space instead of > > trying to enforce some part in the kernel which would only give the > > illusion of security/policy enforcement. > > A problem could be that from IMA perspective there is no indication on > whether the interpreter executed or not execveat(). Sure, we can detect > that the binary supports it, but if the enforcement was > enabled/disabled that it is not recorded. We should assume that if the interpreter call execveat+AT_CHECK, it will enforce restrictions according to its securebits. > > Maybe, setting the process flags should be influenced by the kernel, > for example not allowing changes and enforcing when there is an IMA > policy loaded requiring to measure/appraise scripts. LSMs can set the required securebits per task/interpreter according to their policies. > > Roberto > > > > > > > Though perhaps the issue here is that an execveat() EINVAL doesn't > > > tell the program if AT_CHECK is unimplemented or if something else > > > went wrong, and the secbit prctl() will give the correct signal about > > > AT_CHECK availability? > > > > This kind of check could indeed help to identify the issue. > >