Re: Redesigning Libvirt: Adopting use of a safe language

"Daniel P. Berrange" <berrange@xxxxxxxxxx> · Tue, 14 Nov 2017 17:27:01 +0000

The Problem(s)
==============

When libvirt was created, C was the only viable choice for anything aiming to be
a core system library component. At that time 2005, aside from C there were
common choices of Java, Python, Perl. Java was way too heavy for a low level
system component, Python was becoming popular but not widely used for low level
system services and Perl was on a downward trend. None of them are accessible to
arbitrary languages as libraries, without providing a RPC based API service. As
it turns out libvirt did end up having RPC based approach for many virt drivers,
but the original approach was to be a pure library component.

IOW it is understandable why C was chosen back in 2005, but 12 years on the world
around us has changed significantly. It has long been accepted that C is a very
challenging language to write "safe" applications. By "safe" I mean avoiding the
many problems that lead to critical security bugs. In particular the lack of a
safe memory management framework leads to memory leaks, double free's, stack or
heap corruption and more. The lack of strict type safety just compounds these
problems. We've got many tools to help us in this area, and at times have tried
to design our APIs to avoid problems, but there's no getting away from fact that
even the best programmers will continually screw up memory management leading to
crashes & security flaws. It is just a fact of life when using C, particularly if
you want to be fast at accepting new feature proposals.

It is no surprise that there have been no new mainstream programming languages in
years (decades) which provide an inherantly unsafe memory management framework.
Even back in 2005 security was a serious challenge, but in the last 10+ years
the situation has only got worse with countless high profile security bugs a
direct result of the choice to use C. Given the threat's faced today, one has to
seriously consider the wisdom of writing any new system software in C. In another
10 years time, it would not surprise me if any system software still using C is
considered an obsolete relic, and ripe for a rewrite in a memory safe language.

There are long term implications for the potential pool of contributors in the
future. There has always been a limited pool of programmers able todo a good job
in C, compared to those who know higher level languages like Python/Java. A
programmer write bad code in any language, but in C/C++ that bad code quickly
turns into a serious problem. Libvirt has done ok despite this, but I feel our
level of contribution, particularly "drive by" patch submissions, is held back
by use of C. Move forward another 10 years, and while C will certainly exist, I
struggle to imagine the talent pool being larger. On the contrary I would expect
it to shrink, certainly in relative terms, and possibly in absolute terms, as
other new languages take C's place for low level systems programming. 10 years
ago, Docker would have been written in C, but they took the sensible decision to
pick Go instead. This is happening everywhere I look, and if not Go, then Rust.

We push up against the boundaries of what's sane todo in C in other ways too.
For portability across operating systems, we have to rely on GNULIB to try
to sanitize the platform inconsistencies where we use POSIX, and assume that
any 3rd party libraries we use have done likewise.

Even then, we've tried to avoid using the platform APIs because their designs
are often too unsafe to risk using directly (strcat, malloc, free), or are not
thread safe (APIs lacking _r variants). So we build our own custom C platform
library on top of the base POSIX system, re-inventing the same wheel that every
other project written in C invents. Every time we have to do work at the core C
platform level, it is diverting time away from doing working managing higher
level concepts.

Our code is following an object oriented design in many areas, but such a notion
is foreign to C, so we have to bolt a poor-mans OO framework on the side. This
feeds back into the memory safety problem, because our OO invention cannot be
type checked reliably at compile time, making it easy to do unsafe things with
objects. It relies on reference counting because there's no automatic memory
management.

The other big trend of the past 10 years has been the increase in CPU core
counts. My first libvirt dev machine had 1 physical CPU with no cores or threads
or NUMA. My current libvirt dev machine has 2 CPUs, each with 6 cores, for 12
logical CPUs. Common server machines have 32/64 logical CPUs, and high end has
100's of CPUs. In 10 years, we'll see high end machines with 1000's of CPUs and
entry level with mere 100's. IOW good concurrency is going to be key for any
scalable application. Libvirt is actually doing reasonably well in this respect
via our heavily threaded libvirtd daemon. It is not without cost though with
ever more complex threading & locking models, which still have scalability
problems. Part of the problem is that, despite Linux having very low overhead
thread spawning, threads still consume non-trivial resources, so we try to
constrain how many we use, which forces an M:N relationship between jobs we need
to process and threads we have available.

The Solution(s)
===============

Two fairly recent languages, Go & Rust, have introduced new credible options for
writing systems applications without sacrificing the performance of C, while
achieving the kind of ease of use / speed of development seen with languages
like Python. It goes without saying that both of them are memory safe languages,
immediately solving the biggest risk of using C / C++.

The particularly interesting & relevant innovation of Go is the concept of
Goroutines for concurrent programming, which provide a hybrid kernel/userspace
threading model. This lowers the overhead of concurrency to the point where you
can consider spawning a new goroutine for each logical job. For example, instead
of having a single thread or limited pool of threads servicing all QEMU monitor
sockets & API clients, can you afford to have a new goroutine dedicated to each
monitor socket and API client. That has the potential to dramatically simplify
use of concurrency while at the same time allowing the code to make even better
use of CPUs with massive core counts.

It of course provides a cross platform portable core library of features, and has
a massive ecosystem of developers providing further 3rd party libraries for a
wide variety of features. This means developers can focus more time on solving
the interesting problems in their application space. The Go code is still low
level enough that it can interface with C code easily. FFI calls to C APIs can be
made inline in the Go code, with no need to switch out to write a low level
binding in C itself. In many ways, Go can be said to have the ease of use, fast
learning & safety of Python, combined with the expressiveness of C. IOW it is a
better C than C.

I don't have direct experiance in Rust, but it has the same kind of benefits over
C as Go does, again without the downsides of languages like Python or Java. There
are some interesting unique features to Rust that can be important to some apps.
In particular it does not use garbage collection, instead the user must still do
manual memory management as you would with C/C++. This allows Rust to be used in
performance critical cases where it is unacceptable to have a garbage collector
run. Despite a requirement for manual allocation/deallocation, Rust still
provides a safe memory model. This approach of avoiding abstractions which will
introduce performance overhead is a theme of Rust. The cost of such an approach
is that development has a higher learning curve and ongoing cost in Rust, as
compared to Go. 

I don't believe that the unique features of Rust, over Go, are important to the
needs of libvirt. eg while for QEMU it would be critical to not have a GC
doing asynchronous memory deallocation, this is not at all important to libvirt.
In fact precisely the opposite, libvirt would benefit much more from having GC
take care of deallocation, letting developers focus attention other areas. In
general, as from having a memory safe language, what libvirt would most benefit
from is productivity gains & ease of contribution. This is the core competancy
of Go, and why it is the right choice for usage in libvirt.

The obvious question / difficulty is deciding how to adopt usage of a new
language, without throwing everything away and starting from scratch. It needs
to be possible for contributors to continue working on every other aspect of the
project while adoption takes place over the long term. Blocking ongoing feature
work for prolonged periods of time is not acceptable.

There is also a question of scope of the work. A possible target would be to aim
for 100% elimination of C in N years time (for a value of N that is certainly
greater than 5, possibly as much as 10). There is a question of just whether that
is a good use of resources, and even practical. In terms of management of KVM
guests the bulk of ongoing development work, and complexity is in the libvirtd
daemon. The libvirt.so library merely provides the remote driver client which is
largely stable & unchanging. So with this in the mind the biggest benefits would
be in tackling the daemon part of the code where all the complexity lives.

As mentioned earlier, Go has a very effective FFI mechanism for calling C code
from Go, and also allows Go code to be called from C. There are some caveats to
be aware of with passing data between the languages, however, generally it is
neccessary to copy data structures as C code is not permitted to derefence
pointers that are owned by the Go GC system. There are two possible approaches
to take, which can be crudely described as top down, or bottom up.

In the top down approach, the C file providing the main() method gets replaced
by a Go file providing an equivalent main() method, which then simply does an
FFI call to the existing libvirt C APIs to run the code. For example it would
just call virNetServer APIs to setup the RPC layer. Effectively have a Go program
where 90% of the code is an FFI call to existing libvirt C code. Then we would
gradually iterate downwards converting increasing areas of C code to Go code.

In the bottom up approach, the program remains a C program, but we built .a files
containing Go code for core pieces of functionality. The C code can thus call
into this archive and end up executing Go code for certain pieces. Then we would
gradually iterate upwards converting increasing areas of C code to Go code, until
eventually reaching the top main() method.

Or a hybrid of both approaches can be taken. Whichever way is chosen is going to
be a long process and many bumps in the road.

The best way to start, however, is probably to focus on a simple self-contained
area of libvirt code. Specifically attack the virtlockd, and/or virtlogd daemons,
converting them to use Go. This still need not be done in a "big bang". A first
phase would be to develop the server side framework for handling our RPC protocol
deserialization. This could then just dispatch RPC calls to the existing C impls.
As a second phase, the RPC method impls would be converted to Go. Both of these
daemons are small enough that the conversion would be possible across the time
of a couple of releases. The hardest bit is likely ensuring compatibility for
the re-exec() upgrade model they support, but this is none the less doable.
The lessons learned in this would go a long way towards informing the best way
to tackle the bigger task of the monolithic libvirtd (or equivalently the swarm
of daemons the previous proposal suggests)

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list