On 7/3/2023 5:28 PM, Roberto Sassu wrote: > On Mon, 2023-07-03 at 17:06 +0200, Jann Horn wrote: >> On Thu, Jun 22, 2023 at 4:45 PM Roberto Sassu >> <roberto.sassu@xxxxxxxxxxxxxxx> wrote: >>> I wanted to execute some kernel workloads in a fully isolated user >>> space process, started from a binary statically linked with klibc, >>> connected to the kernel only through a pipe. >> >> FWIW, the kernel has some infrastructure for this already, see >> CONFIG_USERMODE_DRIVER and kernel/usermode_driver.c, with a usage >> example in net/bpfilter/. > > Thanks, I actually took that code to make a generic UMD management > library, that can be used by all use cases: > > https://lore.kernel.org/linux-kernel/20230317145240.363908-1-roberto.sassu@xxxxxxxxxxxxxxx/ > >>> I also wanted that, for the root user, tampering with that process is >>> as hard as if the same code runs in kernel space. >> >> I believe that actually making it that hard would probably mean that >> you'd have to ensure that the process doesn't use swap (in other >> words, it would have to run with all memory locked), because root can >> choose where swapped pages are stored. Other than that, if you mark it >> as a kthread so that no ptrace access is allowed, you can probably get >> pretty close. But if you do anything like that, please leave some way >> (like a kernel build config option or such) to enable debugging for >> these processes. > > I didn't think about the swapping part... thanks! > > Ok to enable debugging with a config option. > >> But I'm not convinced that it makes sense to try to draw a security >> boundary between fully-privileged root (with the ability to mount >> things and configure swap and so on) and the kernel - my understanding >> is that some kernel subsystems don't treat root-to-kernel privilege >> escalation issues as security bugs that have to be fixed. > > Yes, that is unfortunately true, and in that case the trustworthy UMD > would not make things worse. On the other hand, on systems where that > separation is defined, the advantage would be to run more exploitable > code in user space, leaving the kernel safe. > > I'm thinking about all the cases where the code had to be included in > the kernel to run at the same privilege level, but would not use any of > the kernel facilities (e.g. parsers). Thanks for reminding me of kexec-tools. The complete image for booting a new kernel was originally prepared in user space. With kernel lockdown, all this code had to move into the kernel, adding a new syscall and lots of complexity to build purgatory code, etc. Yet, this new implementation in the kernel does not offer all features of kexec-tools, so both code bases continue to exist and are happily diverging... > If the boundary is extended to user space, some of these components > could be moved away from the kernel, and the functionality would be the > same without decreasing the security. All right, AFAICS your idea is limited to relatively simple cases for now. I mean, allowing kexec-tools to run in user space is not easily possible when UID 0 is not trusted, because kexec needs to open various files and make various other syscalls, which would require a complex LSM policy. It looks technically possible to write one, but then the big question is if it would be simpler to review and maintain than adding more kexec-tools features to the kernel. Anyway, I can sense a general desire to run less code in the most privileged system environment. Robert's proposal is one of few that go in this direction. What are the alternatives? Petr T