Re: Why you might want packages not containers for Ceph deployments

"Fox, Kevin M" <Kevin.Fox@xxxxxxxx> · Fri, 25 Jun 2021 16:42:39 +0000

Orchestration is hard, especially with every permutation. The devs have implemented what they feel is the right solution for their own needs from the sound of it. The orchestration was made modular to support non containerized deployment. It just takes someone to step up and implement the permutations desired. And ultimately that's what opensource is geared towards. With opensource and some desired feature, you can:
1. Implement it
2. Pay someone else to implement it
3. Convince someone else to implement it in their spare time.

The thread seems to be currently focused around #3 but no developer seems to be interested in implementing it. So that leaves options 1 and 2?

To move this forward, is anyone interested in developing package support in the orchestration system or paying to have it implemented?

________________________________________
From: Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx>
Sent: Wednesday, June 2, 2021 2:26 PM
To: Matthew Vernon; ceph-users@xxxxxxx
Subject:  Re: Why you might want packages not containers for Ceph deployments

Check twice before you click! This email originated from outside PNNL.

Hi,

that's also a +1 from me — we also use containers heavily for scientific workflows, and know their benefits well.
But they are not the "best", or rather, the most fitting tool in every situation.
You have provided a great summary and I agree with all points, and thank you a lot for this very competent and concise write-up.

Since in this lengthy thread, static linking and solving the issue of many inter-dependencies for production services with containers have been mentioned as solutions,
I'd like to add another point to your list of complexities:
* Keeping production systems secure may be a lot more of a hassle.

Even though the following article is long and many may regard it as controversial, I'd like to link to a concise write-up from a packager discussing this topic in a quite generic way:
  https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fblogs.gentoo.org%2Fmgorny%2F2021%2F02%2F19%2Fthe-modern-packagers-security-nightmare%2F&amp;data=04%7C01%7CKevin.Fox%40pnnl.gov%7C7e520344a4cb466b0fc908d9260d5851%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637582661036645267%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qHV9gj8s0oEmHpHp5ZZdzsf%2Fs5Z6RhUZS8PaHwzeNRs%3D&amp;reserved=0
While the article discusses the issues of static linking and package management performed in language-specific domains, it applies all the same to containers.

If I operate services in containers built by developers, of course this ensures the setup works, and dependencies are well tested, and even upgrades work well — but it also means that,
at the end of the day, if I run 50 services in 50 different containers from 50 different upstreams, I'll have up to 50 different versions of OpenSSL floating around my production servers.
If a security issue is found in any of the packages used in all the container images, I now need to trust the security teams of all the 50 developer groups building these containers
(and most FOSS projects won't have the ressources, understandably...),
instead of the one security team of the disto I use. And then, I also have to re-pull all these containers, after finding out that a security fix has become available.
Or I need to build all these containers myself, and effectively take over the complete job, and have my own security team.

This may scale somewhat well, if you have a team of 50 people, and every person takes care of one service. Containers are often your friend in this case[1],
since it allows to isolate the different responsibilities along with the service.

But this is rarely the case outside of industry, and especially not in academics.
So the approach we chose for us is to have one common OS everywhere, and automate all of our deployment and configuration management with Puppet.
Of course, that puts is in one of the many corners out there, but it scales extremely well to all services we operate,
and I can still trust the distro maintainers to keep the base OS safe on all our servers, automate reboots etc.

For Ceph, we've actually seen questions about security issues already on the list[0] (never answered AFAICT).

To conclude, I strongly believe there's no one size fits all here.

That was why I was hopeful when I first heard about the Ceph orchestrator idea, when it looked to be planned out to be modular,
with the different tasks being implementable in several backends, so one could imagine them being implemented with containers, with classic SSH on bare-metal (i.e. ceph-deploy-like), ansible, rook or maybe others.
Sadly, it seems it ended up being "container-only".
Containers certainly have many uses, and we run thousands of them daily, but neither do they fit each and every existing requirement,
nor are they a magic bullet to solve all issues.

Cheers,
        Oliver

[0] https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.ceph.io%2Fhyperkitty%2Flist%2Fceph-users%40ceph.io%2Fmessage%2FPPLJIHT6WKYPDJ45HVJ3Z37375WIGKDW%2F&amp;data=04%7C01%7CKevin.Fox%40pnnl.gov%7C7e520344a4cb466b0fc908d9260d5851%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637582661036645267%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2Fhu%2FV76u5VZEQSYO1C6miysUO%2B%2FU1oxdtNYmgb%2FpjUY%3D&amp;reserved=0
[1] But you may also just have a very well structured configuration management system fitting your organizational structure.

Am 02.06.21 um 11:36 schrieb Matthew Vernon:
> Hi,
>
> In the discussion after the Ceph Month talks yesterday, there was a bit of chat about cephadm / containers / packages. IIRC, Sage observed that a common reason in the recent user survey for not using cephadm was that it only worked on containerised deployments. I think he then went on to say that he hadn't heard any compelling reasons why not to use containers, and suggested that resistance was essentially a user education question[0].
>
> I'd like to suggest, briefly, that:
>
> * containerised deployments are more complex to manage, and this is not simply a matter of familiarity
> * reducing the complexity of systems makes admins' lives easier
> * the trade-off of the pros and cons of containers vs packages is not obvious, and will depend on deployment needs
> * Ceph users will benefit from both approaches being supported into the future
>
> We make extensive use of containers at Sanger, particularly for scientific workflows, and also for bundling some web apps (e.g. Grafana). We've also looked at a number of container runtimes (Docker, singularity, charliecloud). They do have advantages - it's easy to distribute a complex userland in a way that will run on (almost) any target distribution; rapid "cloud" deployment; some separation (via namespaces) of network/users/processes.
>
> For what I think of as a 'boring' Ceph deploy (i.e. install on a set of dedicated hardware and then run for a long time), I'm not sure any of these benefits are particularly relevant and/or compelling - Ceph upstream produce Ubuntu .debs and Canonical (via their Ubuntu Cloud Archive) provide .debs of a couple of different Ceph releases per Ubuntu LTS - meaning we can easily separate out OS upgrade from Ceph upgrade. And upgrading the Ceph packages _doesn't_ restart the daemons[1], meaning that we maintain control over restart order during an upgrade. And while we might briefly install packages from a PPA or similar to test a bugfix, we roll those (test-)cluster-wide, rather than trying to run a mixed set of versions on a single cluster - and I understand this single-version approach is best practice.
>
> Deployment via containers does bring complexity; some examples we've found at Sanger (not all Ceph-related, which we run from packages):
>
> * you now have 2 process supervision points - dockerd and systemd
> * docker updates (via distribution unattended-upgrades) have an unfortunate habit of rudely restarting everything
> * docker squats on a chunk of RFC 1918 space (and telling it not to can be a bore), which coincides with our internal network...
> * there is more friction if you need to look inside containers (particularly if you have a lot running on a host and are trying to find out what's going on)
> * you typically need to be root to build docker containers (unlike packages)
> * we already have package deployment infrastructure (which we'll need regardless of deployment choice)
>
> We also currently use systemd overrides to tweak some of the Ceph units (e.g. to do some network sanity checks before bringing up an OSD), and have some tools to pair OSD / journal / LVM / disk device up; I think these would be more fiddly in a containerised deployment. I'd accept that fixing these might just be a SMOP[2] on our part.
>
> Now none of this is show-stopping, and I am most definitely not saying "don't ship containers". But I think there is added complexity to your deployment from going the containers route, and that is not simply a "learn how to use containers" learning curve. I do think it is reasonable for an admin to want to reduce the complexity of what they're dealing with - after all, much of my job is trying to automate or simplify the management of complex systems!
>
> I can see from a software maintainer's point of view that just building one container and shipping it everywhere is easier than building packages for a number of different distributions (one of my other hats is a Debian developer, and I have a bunch of machinery for doing this sort of thing). But it would be a bit unfortunate if the general thrust of "let's make Ceph easier to set up and manage" was somewhat derailed with "you must use containers, even if they make your life harder".
>
> I'm not going to criticise anyone who decides to use a container-based deployment (and I'm sure there are plenty of setups where it's an obvious win), but if I were advising someone who wanted to set up and use a 'boring' Ceph cluster for the medium term, I'd still advise on using packages. I don't think this makes me a luddite :)
>
> Regards, and apologies for the wall of text,
>
> Matthew
>
> [0] I think that's a fair summary!
> [1] This hasn't always been true...
> [2] Simple (sic.) Matter of Programming
>
>

--
Oliver Freyermuth
Universität Bonn
Physikalisches Institut, Raum 1.047
Nußallee 12
53115 Bonn
--
Tel.: +49 228 73 2367
Fax:  +49 228 73 7869
--

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx