Re: Redesigning Libvirt: Adopting use of a safe language

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 20, 2017 at 03:25:33PM +0000, Daniel P. Berrange wrote:
On Mon, Nov 20, 2017 at 12:24:22AM +0100, Martin Kletzander wrote:
On Tue, Nov 14, 2017 at 05:27:01PM +0000, Daniel P. Berrange wrote:

[...]

> I don't have direct experiance in Rust, but it has the same kind of benefits over
> C as Go does, again without the downsides of languages like Python or Java. There
> are some interesting unique features to Rust that can be important to some apps.
> In particular it does not use garbage collection, instead the user must still do
> manual memory management as you would with C/C++. This allows Rust to be used in
> performance critical cases where it is unacceptable to have a garbage collector
> run. Despite a requirement for manual allocation/deallocation, Rust still
> provides a safe memory model. This approach of avoiding abstractions which will
> introduce performance overhead is a theme of Rust. The cost of such an approach
> is that development has a higher learning curve and ongoing cost in Rust, as
> compared to Go.
>
> I don't believe that the unique features of Rust, over Go, are important to the
> needs of libvirt. eg while for QEMU it would be critical to not have a GC
> doing asynchronous memory deallocation, this is not at all important to libvirt.
> In fact precisely the opposite, libvirt would benefit much more from having GC
> take care of deallocation, letting developers focus attention other areas. In
> general, as from having a memory safe language, what libvirt would most benefit
> from is productivity gains & ease of contribution. This is the core competancy
> of Go, and why it is the right choice for usage in libvirt.
>

So the first thing I disagreed on is that in Rust you do manual
allocations.  In fact, you don't.  Or, depending on the point of view,
you do less or the same amount of manual allocation than in Go.  What is
the clear win for Rust it the concept of ownership and it's related to
the allocation mentioned before.

I shouldn't have used the word "allocation" in my paragraph above. As
you say, both languages have similar needs around allocation. The difference
I meant is around deallocation policy - in Rust, object lifetime is is a more
explicit decision on control of the progammer, as opposed to Go's garbage
collection.  From what I've read Rust approach to deallocation is much
closer to the C++ concept of "smart pointers", eg this

 http://pcwalton.github.io/blog/2013/03/18/an-overview-of-memory-management-in-rust/


This is kind of old, that code wouldn't run with newer Rust.  I guess
that is from far ago when it was not stabilized at all.  It is a bit
smarter now.  The fact that you have control over when the value is
getting freed is true, however you rarely have to think about that.
What's more important is that the compiler prevents you from accessing
value from multiple places or not knowing who "owns" (think of it as
"who should take care of freeing it") the variable.  If you give the
ownership to someone you can't access it.  The difference I see is that
if you access it after some other part of the code is responsible for
that variable in Rust the compiler will cut you off unless you clearly
specify how the memory space related to the variable is supposed to be
handled.  In Go it will just work (with potential bug) but it will not
crash because GC will not clean it up when someone can still access it.
Granted this is usually problem with concurrent threads/coroutines (which
I don't know how they handle access concurrent access in Go).  Also, for
example Rust doesn't allow you to have value accessible from multiple
threads unless it is guarded by thread-safe reference counter and does
not allow you to modify it unless it is guarded by Mutex, RWLock or s
one of the Atomic types.  Again, I don't know Go that much, I'm still
yet to delve into the deep unknowns of it, but I haven't heard about it
providing such safety.

I am standing strongly behind the opinion that the learning curve of
Rust is definitely worth it.  And coming from the C world, it is easy to
understand.  To me, it is very easy to explain that concept to great
detail to someone who has background in libvirt.  And the big benefit
(and still a huge opportunity for improvement WRT optimizations) is that
the compiler must know about it and so it is resolved compile-time.
Dereferencing or destructors are run at the end of their scope,
automatically.  You can nicely see that when realizing that Rust doesn't
need any `defer` as Go has.

Nb the 'defer' concept isn't really about memory management per-se, rather
it is focused on cleanup of related resources - eg deciding when to close
an open file handle, or when to release another resource. Everything thats
just Go object memory is handling by the GC.

One of the things is that the kinks we have can be ironed out in C as
well.  It might be easier in other languages, but it is harder when you
have to switch to one.  We have bunch of code dealing with backwards
compatibility.  And I argue that this is something that causes issues on
its own.  What's even worse, IMHO, is that we are so much feature-driven
that there is no time for any ironing.  I see too much potential for
refactoring in various parts of libvirt that will never see the lights
of day because we need X to be implemented.  And contributors sending
feature requests that they fail to maintain later don't help much with
that.  Maybe we could fix this by saying the next Y releases will just
be bugfix releases.  Maybe we could help bringing new contributors by
devoting some of our time to do an actual change that will make them
want to help us more.  I know some of you will be sick and tired hearing
about Rust once more, but have you heard about how much their community
is inclusion-oriented?  I guess what I'm trying to say is that there are
other (and maybe less disruptive) ways to handle the current problems we
are facing.

I'm not going to debate that there's plenty of problems we could be
tackling, and changing language is not a magic bullet for all of
them. My primary motivation is to start to get out of the world where
we have random crashes & security bugs due to overflowing buffers,
double frees, use after free, and so on. Problems that have been
solved by every language invented since C in the last 30-40 years.

On the choice of language, I will say C is a turn off to many people
as it is (not unreasonably) viewed as archaic, hard to learn and
difficult to write good code in.


I would say this highly depends on what area you are coming from.  In
the cloud world it would be viewed way differently than in the
dark^Wlow-level side of things.  I don't want to compare libvirt to the
kernel for example, but I haven't heard about C being a turn-off there.

When I worked in OpenStack it was a constant battle to get people to
consider enhancements to libvirt instead of reinventing it in Python.
It was a hard sell because most python dev just didn't want to use C
at all because it has a high curve to contributors, even if libvirt
as a community is welcoming. As a result OpenStack pretty much reinvented

I'm sorry, but I think only handful of us (yeah, I think and hope I
could count myself amoungst that group) are welcoming.  But it's
actually where I see one of the big turn-offs.

its own hypervisor agnostic API for esx, hyperv, xenapi and KVM instead
of enhancing libvirt's support for esx, hyperv or xenapi. They only end
up using libvirt for KVM and libxl really. I hear similar comments from
people working in virt related projects in Go. So use of C really does
have an impact on our pool of potential contributors. This is disappointing

It's hard to guess how that would turn out if C wasn't turn-off for
them.  Maybe we would get bunch of patchsets submitted that would be in
better shape language-wise, but would that help with understanding
various internal structures and behaviours of libvirt?  Not considering
the fact that some language would make it more readable.  Or would we
just get bunch more drive-by patches that we would have to fix/maintain?
I don't think we can answer that.

as there are huge numbers of people working on virt related projects, they
just don't ever go near C - virt-viewer & virsh are probably the only
"apps" using libvirt from C - all the others use a higher level language
via one of our bindings.


And then there are the "issues" with Go (and unfortunately some with
Rust as well :'( ).

Yep, no choice is ever perfect.

Lot of the code for libraries is written with permissive licences, but
if there is some that is LGPL-incompatible we can't use them.  And in
ecosystems such as Rust and Go there are fewer alternatives, so we might
not find one that we'll be able to use.  If that happens, there goes
bunch of our time.  Like nothing.

I've not looked at the rust ecosystem, but from I've seen in Go most
devs tend to go for even more permissive licensing, ie BSD/Apache/MIT.
Disappointly few people pick GPL variants :-(  So side from the
complication with our virtualbox code being GPLv2-only, I don't think
there's a license problem to worry about with Go, anymore than we
have to worry about with C today.


OK, and ...

How do we deal with problems/bugs in dependence libraries?  I know, all
the projects are pretty new, so they might be nicer to contributors.  If
they are not, we might need to fork or rewrite the code.  Bam, another
chance of losing workforce.

Many C libs we depend on have been around along time so are more
mature, but are also more conservative  in accepting changes,
especially if they touch API. On balance I don't think there would
be a big difference either way in this area.


... OK, I hope so and if you say so I will blindly believe that ;)


How does Go handle updates in dependency libs?  Does it automatically
pull newest version from public repositories where some unknown person
can push whatever they want?  Or can they be hash- or version-bound?

Originally it was quite informal (and awful) - your 3rd party deps
just had to be present in $GOPATH, and there was no tracking of
versions. No significant sized project works this way anymore
because it is insane.

Instead go introduced the concept they call the "vendoring" where
you have a top level dir called vendor/ where all your deps live.
You app provides a metadata file (in JSON typically) where you
list the deps and the preferred versions you need. The tool then
populates vendor/ with the right code to build against. Think
of vendor/ has been sort of like GIT submodules, but not using
GIT submodules, and you'll be thinking along the right lines.
Go libraries are being strongly encouraged to adopt semver for
their versioning.


Oh, good, I was hoping for this.

The build process is that all the binaries are static, right?  Or all
the go code is static and it only has dynamic dependencies on C
libraries? In such a project as libvirt, wouldn't that mean that the
processes we run will be pretty heavy-weight?

Yes, all the Go code is statically linked - only C libs are dyn
loaded. This doesn't have any impact on runtime, because even if
the binary was 100's of MB in size, the kernel is only ever going
to page in sections of that file which are actually executed.


But the code that is used by each binary will be present as many times
as that binary is running since it is not dynamically loaded, right?

The main impact is that if a dependancy gets an update (eg for
a security fix) all downstream apps need rebuilding.


Well, that sucks, but it's not a deal-breaker.

How is it with rebuilding after a small change.  I know Go is good when
it comes to compilation times.  That might be something that people
might like a lot.  Especially those who are trying to shave off every
second of compilation.  However if you cannot use ccache and you always
need to rebuild everything, it might increase the build-time quite a
lot, even thought indirectly.

Compile times are great - the compiler is very fast, and it does
caching on a per-package basis. NB in go a "package" is an individual
directory in your source tree, so a typical app would have 10's
or 100's of packages, each corresponding to a separate subdir.
Source dirs are probably more fine grained that we use in libvirt.
eg what's in src/qemu in libvirt currently would likely end up
being spread across as many as 5 packages if it were idiomatic
Go.

The biggest win though comes from not needing autoconf, automake.
Of course libvirt wouldn't see that benefit as long as any of our
code were still C, so I won't claim that's a win in this particular
case.

You can't have /tmp mounted with noexec option unless you have TMPDIR
set to some other directory.  And sometimes not even in that case.  I
guess non-issue for some distros, but bunch of people deal with that and
if seems like something that could be taken care of in Go itself and it
is just not.

I've not heard of that one before, and so obviously not hit it.
From Google it seems this applies if you use 'go run' or 'go test'
commands. The former is not something you typically use, but the
latter is. It can be avoided by having the make/shell run that
invokes 'go test' set a local TMPDIR - no need to set it globally
in your bash profile. I'm not sure what distros have /tmp with
noexec - Fedora doesn't at least.


Usually if you don't have SELinux and want to be guarded a tiny bit
more.  Like me.  Maybe I'll cahnge that someday.  I managed to fix that
by having entry and exit scripts that mangle the TMPDIR env var for me.
And it works in some cases.

You can't just clone a repo, cd into it and build it.  You have to get
the dependencies in a special manner, for that you have to have GOPATH
set and based on that you have to have your directories setup similarly
to what Go expects, then you need GOBIN if you want to build something
and other stuff that's just not nice and doesn't make much sense.  At
least for newcomers.  Simply the fact that it seems to me like Go is
trying to go against the philosophy of "Do one thing and do it well".  I
know everyone is about having the build described in the same language
as the project, but what comes out of it is not something I prefer.

This is relataed to the thing I mention above where historically
everything was just splattered into $GOPATH. Most apps have gone
towards the vendoring concept where every dep is self-contained
in your local checkout.


I have to read up on the basics on how to do a proper first-time setup
for Go.  And maybe everything will be sunshine and rainbows from that
point forward.  It's just that I don't like the fact that it's
non-intuitive.  Is it possible somehow to just build some code without
actually dealing with bunch of dependencies and all the setup?  To give
an example, is there a way to differentiate the compiler from the
dependency handling and build processes?  Like gcc and
autoconf/automake/make or rustc and cargo?

I'm not against using another language to make some stuff better.  I
guess it is kind of visible from the mail that I like Rust, but I'm not
against other languages as well.  I just want this to be full on
discussion and I want my opinion to be expressed.  Thanks for
_listening_ ;)

Either Rust or Go would be a step forward over staying exclusively
with C IMHO, so at least we agree that there are potential benefits
to either :-)


Yeah.  Good luck with evaluating the responses.  I hope we won't need to
change our Code of Conduct or even resort to voting...

Regards,
Daniel
--
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Attachment: signature.asc
Description: Digital signature

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list

[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]
  Powered by Linux