On 04/15/2013 09:04 PM, Björn Persson wrote:
Miloslav Trmač wrote:
The logical conclusion from this is to move to a language with automatic
memory management. The "top vulnerability" reports for programs written in
C/C++ and most other languages so different that starting a new project
that processes untrusted data in C/C++ is becoming indefensible.
If by "automatic memory management" you mean garbage collection, then
that's not really what we need. Garbage collection has advantages, but
what is needed to stop the buffer overflows is bounds checking. The
compiler needs to keep track of how big each object is and insert code
to check that writes to an array stay within the bounds of the array.
There's also the issue of dangling pointers (pointers which point to a
memory location which now holds an object of a different type). They
can result from misapplied memory management, or from type safety
loopholes in the language definition. An example for Ada is here:
<http://www.enyo.de/fw/notes/ada-type-safety.html>
(See the postscript—this was already known in the Ada 83 days. I still
find it remarkable. It's possible to work around this in a GC-based
implementation.)
Now, what to move to? I currently don't have see any language/runtime I
could recommend, which is in itself rather frightening.
I recommend Ada. Ada does bounds checking, and is compiled to machine
code with performance comparable to C.
Yes, Ada has some nice features. At least there are real arrays, but
they are somewhat cumbersome to work with, compared to Java, Python or,
well, C pointers. There are two aspects: preservation of array bounds
in slices (so that you have to write Table (Table'First + Offset) to
access the element Offset of Table, Offset ranging from 0 to
Table'Length - 1), and the fact that is impossible to put an
unconstrained array (of arbitrary length) into a constrained object
(i.e., you need an indirection).
For many programming tasks, arrays might be at the wrong level of
abstraction, but we have a lot of plumbing code which uses them heavily.
Garbage collection support would make it easier to introduce the
indirection, but it would require a conservative collector at present,
and those we have right now (Boehm-Dehmers-Weiser and the Go collectors)
require a process-global view, touch signal handlers etc., so they do
away with one significant Ada advantage (see below).
> Only compiler bugs can cause
buffer overflows in Ada, unless you're so foolhardy that you disable the
bounds checking.
The GNAT run-time is compiled without language-defined checks, and it
used to have at least one buffer overflow in the Ada part. Many Ada
libraries used to follow GNAT's example and disabled the checks as well,
but this has changed during the last few years, it appears. Manual
overflow checks are hampered by the fact that -gnato still isn't the
default.
Ada doesn't do garbage collection across the whole program, but features
such as controlled types, generic data structures and out parameters
greatly reduce the need for garbage collection. The double-free problem
is also eliminated. (Garbage collection was made optional in Ada so
that the language would be suitable for embedded real-time systems, and
in practice most compilers don't provide it.)
Controlled types have a fixed overhead which is quite visible with small
objects. By default, code for abort deferral is emitted, the vtable
pointer takes space, and avoiding unnecessary indirect calls takes some
care by the programmer. There's also no well-defined ABI for shared
libraries (and adding a subprogram can change the name of existing
subprograms).
On the other hand, lack of garbage collection means that it's feasible
to have some GNAT-compiled part in a larger program, without the larger
program noticing that there's a component not written in C. I sometimes
call this "deep embedding support", and only very few language
implementations have this property at present. (Even with GNAT, you
have to restrict yourself to a language subset.) The list of feasible
systems programming languages is much, much longer, but most need global
run-time state, threads, signal handler manipulation, have address space
layout requirements etc. But that is primarily an implementation issue,
not an aspect which is inherent to most languages.
The other aspect is low baseline overhead from the run-time system. We
don't want programmers to rewrite working system components in C only to
reduce memory usage. This is what happened (or is expected to happen)
to some daemons written in Python.
--
Florian Weimer / Red Hat Product Security Team
--
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/devel