Re: Chromium sandbox on LoongArch and statx -- seccomp deep argument inspection again?

Icenowy Zheng <uwu@xxxxxxxxxx> · Sun, 25 Feb 2024 14:51:05 +0800

在 2024-02-24星期六的 19:51 +0800，Huacai Chen写道：
> Hi, Xuerui,
> 
> On Wed, Feb 21, 2024 at 2:10 PM WANG Xuerui <kernel@xxxxxxxxxx>
> wrote:
> > 
> > Hi,
> > 
> > Recently, we -- community LoongArch porters -- have noticed a
> > problem
> > where the Chromium sandbox apparently wants to deny statx [^1] so
> > it
> > could properly inspect arguments after the sandboxed process later
> > falls
> > back to fstat. The reasoning behind the change was not clear in the
> > patch; but we found out it's basically because there's currently
> > not a
> > "fd-only" version of statx, so that the sandbox has no way to
> > ensure the
> > path argument is empty without being able to peek into the
> > sandboxed
> > process's memory. For architectures able to do newfstatat though,
> > the
> > glibc falls back to newfstatat after getting -ENOSYS for statx,
> > then the
> > respective SIGSYS handler [^2] takes care of inspecting the path
> > argument, transforming allowed newfstatat's into fstat instead
> > which is
> > allowed and has the same type of return value.
> > 
> > But, as loongarch is the first architecture to not have fstat nor
> > newfstatat, the LoongArch glibc does not attempt falling back at
> > all
> > when it gets -ENOSYS for statx -- and you see the problem there!
> > 
> > Actually, back when the loongarch port was under review, people
> > were
> > aware of the same problem with sandboxing clone3 [^3], so clone was
> > eventually kept. Unfortunately it seemed at that time no one had
> > noticed
> > statx, so besides restoring fstat/newfstatat to loongarch uapi (and
> > postponing the problem further), it seems inevitable that we would
> > need
> > to tackle seccomp deep argument inspection; this is obviously a
> > decision
> > that shouldn't be taken lightly, so I'm posting this to restart the
> > conversation to figure out a way forward together. We basically
> > could do
> > one of below:
> > 
> > - just restore fstat and be done with it;
> > - add a flag to statx so we can do the equivalent of just fstat(fd,
> > &out) with statx, and ensuring an error happens if path is not
> > empty in
> > that case;
> > - tackle the long-standing problem of seccomp deep argument
> > inspection (!).
> From my point of view, I prefer to "restore fstat", because we need
> to
> use the Chrome sandbox everyday (even though it hasn't been upstream
> by now). But I also hope "seccomp deep argument inspection" can be
> solved in the future.

My idea is this problem needs syscalls to be designed with deep
argument inspection in mind; syscalls before this should be considered
as historical error and get fixed by resotring old syscalls.

> 
> 
> Huacai
> 
> > 
> > Obviously, the simplest solution would be to "just restore fstat".
> > But
> > again, in my opinion this is not quite a solution but a workaround
> > -- we
> > have good reasons to keep just statx (mainly because its feature
> > set is
> > a strict superset of those of fstat/newfstatat), and we're not
> > quite in
> > a hurry to resolve this. Because one of the prerequisites for a new
> > Chromium port is "inclusion in Debian stable" -- as the loong64
> > port
> > [^4] is progressing and the next Debian release would not happen
> > until
> > 2025, we still have time for a more "elegant" solution.
> > 
> > Alternatively, we could also introduce a new flag for statx, maybe
> > named
> > AT_STATX_NO_PATH or something like that, so that statx would ignore
> > the
> > path altogether or error on non-empty paths if called with flags
> > containing AT_EMPTY_PATH | AT_STATX_NO_PATH. This way a seccomp
> > policy
> > could allow statx calls only with such flags (that are passed via
> > register and accessible) and maintain the same level of safety as
> > with
> > fstat. But there is also disadvantage to this approach: the flag
> > would
> > be useful only for sandboxes, because in other cases there's no
> > need to
> > avoid reading from &path. This is also more of a workaround to
> > avoid the
> > deep argument inspection problem.
> > 
> > Lastly, should we decide to go the hardest way, according to a
> > previous
> > mail [^5] (about clone3) and the LPC 2019 discussion [^6] [^7], we
> > probably would try the metadata-annotation-based and piece-by-piece
> > approach, as it's expected to provide the most benefit and involve
> > less
> > code changes. The implementation, as I surmise, will involve
> > modifying
> > the generic syscall entrypoint for early copying of user data, and
> > corresponding changes to seccomp plumbing so this information is
> > properly exposed. I don't have a roadmap for non-generic-entry
> > arches
> > right now, and I also haven't started designing the new seccomp ABI
> > for
> > that. As a matter of fact, members of the LoongArch community
> > (myself
> > included) are still fresh to this area of expertise, so a bit more
> > of
> > your feedback will be appreciated.
> > 
> > Thanks to Miao Wang from AOSC for doing much of the investigation.
> > 
> > [^1]:
> > https://chromium-review.googlesource.com/c/chromium/src/+/2823150
> > [^2]:
> > https://chromium.googlesource.com/chromium/src/sandbox/+/c085b51940bd/linux/seccomp-bpf-helpers/sigsys_handlers.cc#355
> > [^3]:
> > https://lore.kernel.org/linux-arch/20220511211231.GG7074@xxxxxxxxxxxxxxxxxxxxx/
> > [^4]: https://wiki.debian.org/Ports/loong64
> > [^5]:
> > https://lwn.net/ml/linux-kernel/201905301122.88FD40B3@keescook/
> > [^6]: https://lwn.net/Articles/799557/
> > [^7]:
> > https://lpc.events/event/4/contributions/560/attachments/397/640/deep-arg-inspection.pdf
> > 
> > --
> > WANG "xen0n" Xuerui
> > 
> > Linux/LoongArch mailing list:https://lore.kernel.org/loongarch/
> >