* Andy Lutomirski <luto@xxxxxxxxxx> wrote: > On Sat, Apr 27, 2019 at 3:46 AM Ingo Molnar <mingo@xxxxxxxxxx> wrote: > > So I'm wondering whether there's a 4th choice as well, which avoids > > control flow corruption *before* it happens: > > > > - A C language runtime that is a subset of current C syntax and > > semantics used in the kernel, and which doesn't allow access outside > > of existing objects and thus creates a strictly enforced separation > > between memory used for data, and memory used for code and control > > flow. > > > > - This would involve, at minimum: > > > > - tracking every type and object and its inherent length and valid > > access patterns, and never losing track of its type. > > > > - being a lot more organized about initialization, i.e. no > > uninitialized variables/fields. > > > > - being a lot more strict about type conversions and pointers in > > general. > > You're not the only one to suggest this. There are at least a few > things that make this extremely difficult if not impossible. For > example, consider this code: > > void maybe_buggy(void) > { > int a, b; > int *p = &a; > int *q = (int *)some_function((unsigned long)p); > *q = 1; > } > > If some_function(&a) returns &a, then all is well. But if > some_function(&a) returns &b or even a valid address of some unrelated > kernel object, then the code might be entirely valid and correct C, > but I don't see how the runtime checks are supposed to tell whether > the resulting address is valid or is a bug. This type of code is, I > think, quite common in the kernel -- it happens in every data > structure where we have unions of pointers and integers or where we > steal some known-zero bits of a pointer to store something else. So the thing is, for the infinitely large state space of "valid C code" we already disallow an infinitely many versions in the Linux kernel. We have complicated rules that disallow certain C syntactical and semantical constructs, both on the tooling (build failure/warning) and on the review (style/taste) level. So the question IMHO isn't whether it's "valid C", because we already have the Linux kernel's own C syntax variant and are enforcing it with varying degrees of success. The question is whether the example you gave can be written in a strongly typed fashion, whether it makes sense to do so, and what the costs are. I think it's evident that it can be written with strongly typed constructs, by separating pointers from embedded error codes - with negative side effects to code generation: for example it increases structure sizes and error return paths. I think there's four main costs of converting such a pattern to strongly typed constructs: - memory/cache footprint: there's a nonzero cost there. - performance: this will hurt too. - code readability: this will probably improve. - code robustness: this will improve too. So I think the proper question to ask is not whether there's common C syntax within the kernel that would have to be rewritten, but whether the total sum of memory and runtime overhead of strongly typed C programming (if it's possible/desirable) is larger than the total sum of a typical Linux distro enabling the various current and proposed kernel hardening features that have a runtime overhead: - the SMAP/SMEP overhead of STAC/CLAC for every single user copy - other usercopy hardening features - stackprotector - KASLR - compiler plugins against information leaks - proposed KASLR extension to implement module randomization and -PIE overhead - proposed function call integrity checks - proposed per system call kernel stack offset randomization - ( and I'm sure I forgot about a few more, and it's all still only reactive security, not proactive security. ) That's death by a thousand cuts and CR3 switching during system calls is also throwing a hand grenade into the fight ;-) So if people are also proposing to do CR3 switches in every system call, I'm pretty sure the answer is "yes, even a managed C runtime is probably faster than *THAT* sum of a performanc mess" - at least with the current CR3 switching x86-uarch cost structure... Thanks, Ingo