On Thu, 22 Feb 2024 07:51:00 -0800 Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > On 2/22/24 05:12, Petr Tesarik wrote: > > static const struct sbm_fixup fixups[] = > > { > > + /* kmalloc() and friends */ > > + { kmalloc_trace, proxy_alloc3 }, > > + { __kmalloc, proxy_alloc1 }, > > + { __kmalloc_node, proxy_alloc1 }, > > + { __kmalloc_node_track_caller, proxy_alloc1 }, > > + { kmalloc_large, proxy_alloc1 }, > > + { kmalloc_large_node, proxy_alloc1 }, > > + { krealloc, proxy_alloc2 }, > > + { kfree, proxy_free }, > > + > > + /* vmalloc() and friends */ > > + { vmalloc, proxy_alloc1 }, > > + { __vmalloc, proxy_alloc1 }, > > + { __vmalloc_node, proxy_alloc1 }, > > + { vzalloc, proxy_alloc1 }, > > + { vfree, proxy_free }, > > + > > { } > > }; > > Petr, thanks for sending this. This _is_ a pretty concise example of > what it means to convert kernel code to run in your sandbox mode. But, > from me, it's still "no thanks". > > Establishing and maintaining this proxy list will be painful. Folks > will change the code to call something new and break this *constantly*. > > That goes for infrastructure like the allocators and for individual > sandbox instances like apparmor. Understood. OTOH the proxy list is here for the PoC so I could send something that builds and runs without making it an overly big patch series. As explained in patch 5/5, the goal is not to make a global list. Instead, each instance should define what it needs and that way define its specific policy of interfacing with the rest of the kernel. To give an example, these AppArmor fixups would be added only to the sandbox which runs aa_unpack(), but not to the one which runs unpack_to_rootfs(), which is another PoC I did (but required porting more patches). If more fixups are needed after you change your code, you know you've just added a new dependency. It's then up to you to decide if it was intentional. > It's also telling that sandboxing a bit of apparmor took four fixups. > That tells me we're probably still only looking at the tip of the icebeg > if we were to convert a bunch more sites. Yes, it is the cost paid for getting code and data flows under control. In your opinion this kind of memory safety is not worth the effort of explicitly defining the interface between a sandboxed component and the rest of the kernel, because it increases maintenance costs. Correct? > That's on top of everything I was concerned about before. Good, I think I can understand the new concern, but regarding everything you were concerned about before, this part is still not quite clear to me. I'll try to summarize the points: * Running code in ring-0 is inherently safer than running code in ring-3. Since what I'm trying to do is protect kernel data structures from memory safety bugs in another part of the kernel, it roughly translates to: "Kernel data structures are better protected from rogue kernel modules than from userspace applications." This cannot possibly be what you are trying to say. * SMAP, SMEP and/or LASS can somehow protect one part of the kernel from memory safety bugs in another part of the kernel. I somehow can't see how that is the case. I have always thought that: * SMEP prevents the kernel to execute code from user pages. * SMAP prevents the kernel to read from or write into user pages. * LASS does pretty much the same job as SMEP+SMAP, but instead of using page table protection bits, it uses the highest bit of the virtual address because that's much faster. * Hardware designers are adding (other) hardware security defenses to ring-0 that are not applied to ring-3. Could you give an example of these other security defenses, please? * Ring-3 is more exposed to attacks. This statement sounds a bit too vague on its own. What attack vectors are we talking about? The primary attack vector that SBM is trying to address are exploits of kernel code vulnerabilities triggered by data from sources outside the kernel (boot loader, userspace, etc.). H. Peter Anvin added a few other points: * SBM has all the downsides of a microkernel without the upsides. I can only guess what would be the downsides and upsides... One notorious downside is performance. Agreed, there is some overhead. I'm not promoting SBM for time-critical operations. But compared to user-mode helpers (which was suggested as an alternative for one of the proposed scenarios), the overhead of SBM is at least an order of magnitude less. IPC and the need to define how servers interact with each other is another downside I can think of. Yes, there is a bit of it in SBM, as you have correctly noted above. * SBM introduces architectural changes that are most definitely *very* harmful both to maintainers and users. It is very difficult to learn something from this statement. Could you give some examples of how SBM harms either group, please? * SBM feels like paravirtualization all over again. All right, hpa, you've had lots of pain with paravirtualization. I feel with you, I've had my part of it too. Can you imagine how much trouble I could have spared myself for the libkdumpfile project if I didn't have to deal with the difference between "physical addresses" and "machine addresses"? However, this is hardly a relevant point. The Linux kernel community is respected for making decisions based on facts, not feelings. * SBM exposes kernel memory to user space. This is a misunderstanding. Sandbox mode does not share anything at all with user mode. It does share some CPU state with kernel mode, but not with user mode. If "user space" was intended to mean "Ring-3", then it doesn't explain how that is a really bad idea. * SBM is not needed, because there is already eBPF. Well, yes, but I believe they work on a different level. For example, eBPF needs a verifier to ensure memory safety. If you run eBPF code itself in a sandbox instead, that verifier is not needed, because memory safety is enforced by CPU hardware. When hpa says that SandBox Mode is "an enormous step in the wrong direction", I want to understand why this direction is wrong, so I can take a step in the right direction next time. So far there has been only one objective concern: the need to track code (and data) dependencies explicitly. AFAICS this is an inherent drawback of any kind of program decomposition. Is decomposition considered harmful? Petr T