Re: Why you might want packages not containers for Ceph deployments

Martin Verges <martin.verges@xxxxxxxx> · Sat, 19 Jun 2021 21:11:48 +0200

Hello Sage,

> ...I think that part of this comes down to a learning curve...
> ...cephadm represent two of the most successful efforts to address
usability...

Somehow it does not look right to me.

There is much more to operate a Ceph cluster than just deploying software.
Of course that helps on the short run to avoid that people leave the train
right when they started their Ceph journey. But the harder part is what to
do if shit hit's the fan and your cluster is down due to some issues and
then having additional layers of complexity kicking in and biting your ass.
Just saying, that day2 ops is much more important than getting a cluster
up&running. In my believe, no admin want to dig around containers and other
abstractions when the single most important part of a whole IT
infrastructure stops working. But just my thought, maybe I'm wrong.

In my opinion, the best possible way to run IT software is KISS, keep it
stupid simple. No additional layers, no abstractions of abstractions and
good error messages.

For example the docker topic here looks like something that can be
showcased:
> Question: If it uses docker and docker daemon fails what happens to you
containers?
> Answer: This is an obnoxious feature of docker

As you might see, you need a lot of knowledge about abstraction layers to
operate them well. Docker for example provides so called live-restore (
https://docs.docker.com/config/containers/live-restore/) that allows you to
stop the daemon without killing your containers. This enables you to update
docker daemon without downtimes but you have to know it and of course
enable it. This can make operating a Ceph cluster harder, not easier.

What about more sophisticated features, for example performance. Ceph
already is not a fast storage solution with way to high latency. Does it
help to add containers instead of going more direct to the hardware and
reduce overhead? Of course you can run SPDK and/or DPDK inside containers,
but does it make it better and faster or even easier? If you need
high-performance storage today, you can turn to open source alternatives
that are massively cheaper per IO and only minimally more expensive per GB.
I therefore believe, stripping out overhead is also an important topic for
the future of Ceph.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.verges@xxxxxxxx
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx

On Fri, 18 Jun 2021 at 20:43, Sage Weil <sage@xxxxxxxxxxxx> wrote:

> Following up with some general comments on the main container
> downsides and on the upsides that led us down this path in the first
> place.
>
> Aside from a few minor misunderstandings, it seems like most of the
> objections to containers boil down to a few major points:
>
> > Containers are more complicated than packages, making debugging harder.
>
> I think that part of this comes down to a learning curve and some
> semi-arbitrary changes to get used to (e.g., systemd unit name has
> changed; logs now in /var/log/ceph/$fsid instead of /var/log/ceph).
> Another part of these changes are real hoops to jump through: to
> inspect process(es) inside a container you have to `cephadm enter
> --name ...`; ceph CLI may not be automatically installed on every
> host; stracing or finding coredumps requires extra steps. We're
> continuing to improve the tools etc so please call these things out as
> you see them!
>
> > Security (50 containers -> 50 versions of openssl to patch)
>
> This feels like the most tangible critique.  It's a tradeoff.  We have
> had so many bugs over the years due to varying versions of our
> dependencies that containers feel like a huge win: we can finally test
> and distribute something that we know won't break due to some random
> library on some random distro.  But it means the Ceph team is on the
> hook for rebuilding our containers when the libraries inside the
> container need to be patched.
>
> On the flip side, cephadm's use of containers offer some huge wins:
>
> - Package installation hell is gone.  Previously, ceph-deploy and
> ceph-ansible had thousands of lines of code to deal with the myriad
> ways that packages could be installed and where they could be
> published.  With containers, this now boils down to a single string,
> which is usually just something like "ceph/ceph:v16".  We're grown a
> handful of complexity there to let you log into private registries,
> but otherwise things are so much simpler.  Not to mention what happens
> when package dependencies break.
> - Upgrades/downgrades can be carefully orchestrated. With packages,
> the version change is by host, with a limbo period (and occasional
> SIGBUS) before daemons were restarted.  Now we can run new or patched
> code on individual daemons and avoid an accidental upgrade when a
> daemon restarts.  (Also, running e.g. ceph CLI commands no longer
> error out with a dynamic linker error while the package upgrade itself
> is in progress, something all of our automated upgrade tests have to
> carefully avoid to prevent intermittent failures.)
> - Ceph installations are carefully sandboxed.  Removing/scrubbing ceph
> from a host is trivial as only a handful of directories or
> configuration files are touched.  And we can safely run multiple
> clusters on the same machine without worry about bad interactions
> (mostly great for development, but also handy for users experimenting
> with new features etc).
> - Cephadm deploys a bunch of non-ceph software as well to provide a
> complete storage system, including haproxy and keepalived for HA
> ingress for RGW and NFS, ganesha for NFS service, grafana, prometheus,
> node-exporter, and (soon) samba for SMB.  All neatly containerized to
> avoid bumping into other software on the host; testing and supporting
> the huge matrix of packages versions available via various distros
> would be a huge time sink.
>
> Most importantly, cephadm and the orchestrator API vastly improve the
> overall ceph experience from the CLI and dashboard.  Users no longer
> have to give any thought to where and which daemons run if they don't
> want to (or they can carefully specify daemon placement if they
> choose).  And users can use commands like 'ceph fs volume create foo'
> and the fs will get created *and* MDS daemons will be started all in
> one go.  (This would also be possible with a package-based
> orchestrator implementation if one existed.)
>
> We've been beat up for years about how complicated and hard Ceph is.
> Rook and cephadm represent two of the most successful efforts to
> address usability (and not just because they enable deployment
> management via the dashboard!), and taking advantage of containers was
> one expedient way to get to where we needed to go.  If users feel
> strongly about supporting packages, we can get much of the same
> experience with another package-based orchestrator module.  My view,
> though, is that we have much higher priority problems to tackle.
>
> sage
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx