Re: Why you might want packages not containers for Ceph deployments

Sage Weil <sage@xxxxxxxxxxxx> · Fri, 18 Jun 2021 13:42:10 -0500

Following up with some general comments on the main container
downsides and on the upsides that led us down this path in the first
place.

Aside from a few minor misunderstandings, it seems like most of the
objections to containers boil down to a few major points:

> Containers are more complicated than packages, making debugging harder.

I think that part of this comes down to a learning curve and some
semi-arbitrary changes to get used to (e.g., systemd unit name has
changed; logs now in /var/log/ceph/$fsid instead of /var/log/ceph).
Another part of these changes are real hoops to jump through: to
inspect process(es) inside a container you have to `cephadm enter
--name ...`; ceph CLI may not be automatically installed on every
host; stracing or finding coredumps requires extra steps. We're
continuing to improve the tools etc so please call these things out as
you see them!

> Security (50 containers -> 50 versions of openssl to patch)

This feels like the most tangible critique.  It's a tradeoff.  We have
had so many bugs over the years due to varying versions of our
dependencies that containers feel like a huge win: we can finally test
and distribute something that we know won't break due to some random
library on some random distro.  But it means the Ceph team is on the
hook for rebuilding our containers when the libraries inside the
container need to be patched.

On the flip side, cephadm's use of containers offer some huge wins:

- Package installation hell is gone.  Previously, ceph-deploy and
ceph-ansible had thousands of lines of code to deal with the myriad
ways that packages could be installed and where they could be
published.  With containers, this now boils down to a single string,
which is usually just something like "ceph/ceph:v16".  We're grown a
handful of complexity there to let you log into private registries,
but otherwise things are so much simpler.  Not to mention what happens
when package dependencies break.
- Upgrades/downgrades can be carefully orchestrated. With packages,
the version change is by host, with a limbo period (and occasional
SIGBUS) before daemons were restarted.  Now we can run new or patched
code on individual daemons and avoid an accidental upgrade when a
daemon restarts.  (Also, running e.g. ceph CLI commands no longer
error out with a dynamic linker error while the package upgrade itself
is in progress, something all of our automated upgrade tests have to
carefully avoid to prevent intermittent failures.)
- Ceph installations are carefully sandboxed.  Removing/scrubbing ceph
from a host is trivial as only a handful of directories or
configuration files are touched.  And we can safely run multiple
clusters on the same machine without worry about bad interactions
(mostly great for development, but also handy for users experimenting
with new features etc).
- Cephadm deploys a bunch of non-ceph software as well to provide a
complete storage system, including haproxy and keepalived for HA
ingress for RGW and NFS, ganesha for NFS service, grafana, prometheus,
node-exporter, and (soon) samba for SMB.  All neatly containerized to
avoid bumping into other software on the host; testing and supporting
the huge matrix of packages versions available via various distros
would be a huge time sink.

Most importantly, cephadm and the orchestrator API vastly improve the
overall ceph experience from the CLI and dashboard.  Users no longer
have to give any thought to where and which daemons run if they don't
want to (or they can carefully specify daemon placement if they
choose).  And users can use commands like 'ceph fs volume create foo'
and the fs will get created *and* MDS daemons will be started all in
one go.  (This would also be possible with a package-based
orchestrator implementation if one existed.)

We've been beat up for years about how complicated and hard Ceph is.
Rook and cephadm represent two of the most successful efforts to
address usability (and not just because they enable deployment
management via the dashboard!), and taking advantage of containers was
one expedient way to get to where we needed to go.  If users feel
strongly about supporting packages, we can get much of the same
experience with another package-based orchestrator module.  My view,
though, is that we have much higher priority problems to tackle.

sage
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx