Re: Why you might want packages not containers for Ceph deployments

Eneko Lacunza <elacunza@xxxxxxxxx> · Tue, 9 Nov 2021 13:43:40 +0100

Hi,

We're feeling quite the same. We had our first Ceph outage ever last 
week after using Ceph since Firefly. We manage 8 tiny ceph systems in 
production and have stopped upgrading to Pacific for quite a while after 
2 cluster upgrades.

The outage started with the reboot of two mon nodes due to Proxmox 
cluster issue (no hardware/disk issue otherwise). Nodes were back after 
2 minutes, but Ceph cluster went nuts and it stalled like 7 hours after 
that because 2 mons went out of free space (there was massive network 
traffic in-between, not clear why). Mons are on root partition (SSD, 
15GB total) and we had to increase that to more than 100GB to allow Ceph 
to recover... now only 8-9GB are used (for system + mon). It was quite 
insane, and this is in a tiny 15 OSD total 4-node cluster... Cluster 
specs have been those since Firefly (4 HDD OSDs were removed and SSD 
OSDs installed). We also had to restart a mgr daemon that was eating 
45GB of RAM (memory leak I guess)...

I have the feeling that we read about this kind of massive surges in 
used resources in this list quite often. We (users/admins) need a more 
predictable resource usage; there seem to be too many corner cases and 
bugs eating RAM/disk .

I'd have expected Ceph to resync mons in some seconds and maybe perform 
some backfill between OSDs, as that has been our rock-solid experience 
until last week...

Just wanted to make a concrete example of what users are experiencing in 
the field.

Thanks

El 8/11/21 a las 17:59, Francois Legrand escribió:
Hi Franck,

I totally agree with your point 3 (also with 1 and 2 indeed). 
Generally speaking, the release cycle of many softwares tends to 
become faster and faster (not only for ceph, but also openstack 
etc...) and it's really hard and tricky to maintain an infrastructure 
up to date in such conditions, even more when you deal with storage. 
As a result, as you perfectly explained it, this gives the impression 
that the product is not that robust, contains a lot of bugs and needs 
a lot of patches etc. Few times upgrades had been released with 
obvious bugs or regressions (e.g DNS problem in 14.2.12,...) and this 
gives the impression that there is an urge to release, even if the 
corrections are not totally tested... which lead to a loose of 
confidence from the users.

And I am personally in this process !! We wanted to upgrade our 
Nautilus cluster. First we decided to go directly to Pacific, but 
looking to the list it appears to us that Pacific is absolutely not 
stable enough to be considered as a production release. We thus 
decided to go to octopus... maybe we will go to pacific when the v17 
will be out.

I thus feel that the "last stable release" (currently pacific) is in 
fact a development release (and the community is the "testing pool" 
for that release) and the truly stable release is the n-1 one 
(octopus). Thus I am fully supporting your request for a LTS release 
with stability as a main goal.

F.

Le 08/11/2021 à 13:21, Frank Schilder a écrit :
Hi all,

I followed this thread with great interest and would like to add my 
opinion/experience/wishes as well.

I believe the question packages versus containers needs a bit more 
context to be really meaningful. This was already mentioned several 
times with regards to documentation. I see the following three topics 
tightly connected (my opinion/answers included):

1. Distribution: Packages are compulsory, containers are optional.
2. Deployment: Ceph adm (yet another deployment framework) and ceph 
(the actual storage system) should be strictly different projects.
3. Release cycles: The release cadence is way too fast, I very much 
miss a ceph LTS branch with at least 10 years back-port support.

These are my short answers/wishes/expectations in this context. I 
will add below some more reasoning as optional reading (warning: wall 
of text ahead).

1. Distribution
---------

I don't think the question is about packages versus containers, 
because even if a distribution should decide not to package ceph any 
more, other distributors certainly will and the user community just 
moves away from distributions without ceph packages. In addition, 
unless Rad Hat plans to move to a source-only container where I run 
the good old configure - make - make install, it will be package 
based any ways, so packages are there to stay.

Therefore, the way I understand this question is about ceph-adm 
versus other deployment methods. Here, I think the push to a 
container-based ceph-adm only deployment is unlikely to become the 
no. 1 choice for everyone for good reasons already mentioned in 
earlier messages. In addition, I also believe that development of a 
general deployment tool is currently not sustainable as was mentioned 
by another user. My reasons for this are given in the next section.

2. Deployment
---------

In my opinion, it is really important to distinguish three components 
of any open-source project: development (release cycles), 
distribution and deployment. Following the good old philosophy that 
every tool does exactly one job and does it well, each of these 
components are separate projects, because they correspond to 
different tools.

This implies immediately that ceph documentation should not contain 
documentation about packaging and deployment tools. Each of these 
ought to be strictly separate. If I have a low-level problem with 
ceph and go to the ceph documentation, I do not want to see ceph-adm 
commands. Ceph documentation should be about ceph (the storage 
system) only. Such a mix-up is leading to problems and there were 
already ceph-user cases where people could not use the documentation 
for trouble shooting, because it showed ceph-adm commands but their 
cluster was not ceph-adm deployed.

In this context, I would prefer if there was a separate 
ceph-adm-users list so that ceph-users can focus on actual ceph 
problems again.

Now to the point that ceph-adm might be an un-sustainable project. 
Although at a first glance the idea of a generic deployment tool that 
solves all problems with a single command might look appealing, it is 
likely doomed to fail for a simple reason that was already indicated 
in an earlier message: ceph deployment is subject to a complexity 
paradox. Ceph has a very large configuration space and implementing 
and using a generic tool that covers and understands this 
configuration space is more complex than deploying any specific ceph 
cluster, each of which uses only a tiny subset of the entire 
configuration space.

In other words: deploying a specific ceph cluster is actually not 
that difficult.

Designing a - and dimensioning all components of a ceph cluster is 
difficult and none of the current deployment tools help here. There 
is not even a check for suitable hardware. In addition, technology is 
moving fast and adapting a generic tool to new developments in time 
seems a hopeless task. For example, when will ceph-adm natively 
support collocated lvm OSDs with dm_cache devices? Is it even worth 
trying to incorporate this?

My wish would be to keep the ceph project clean of any deployment 
tasks. In my opinion, the basic ceph tooling is already doing tasks 
that are the responsibility of a configuration management- and not a 
storage system (e.g. deploy unit files by default instead of as an 
option disabled by default).

3. Release cycles
---------

Ceph is a complex system and the code is getting more complex every 
day. It is very difficult to beat the curse of complexity that 
development and maintenance effort grows non-linearly 
(exponentially?) with the number of lines of code. As a consequence, 
(A) if one wants to maintain quality while adding substantial new 
features, the release intervals become longer and longer. (B) If one 
wants to maintain constant release intervals while adding substantial 
new features, the quality will have to go down. The last option is 
that (C) new releases with constant release intervals contain ever 
smaller increments in functionality to maintain quality. I ignore the 
option of throwing more and more qualified developers at the project 
as this seems unlikely and also comes with its own complexity cost.

I'm afraid we are in scenario B. Ceph is loosing its nimbus of being 
a rock solid system.

Just recently, there were some ceph-user emails about how dangerous 
or not is it to upgrade to the latest stable octopus version. The 
upgrade itself apparently goes well, but what happens then? I 
personally have too many reports that the latest ceph versions are 
quite touchy and collapse in situations that have never been a 
problem up to mimic (most prominently, that a simple rebalance 
operation after adding disks gets OSDs to flap and can take a whole 
cluster down - plenty of cases since nautilus). Stability at scale 
seems to become a real issue with increasing version numbers. I'm 
myself very hesitant to upgrade, in particular, because there is no 
way back and the cycles of potential doom are so short.

Therefore, I would very much appreciate the foundation of a ceph-LTS 
branch with at least 10 years back-port support, if not longer. In 
addition, upgrade procedures between LTS versions should allow a 
downgrade by one version as well (move legacy data along until 
explicitly allowed to cut all bridges). For any large storage system, 
robustness, predictability and low maintenance effort are invaluable. 
For example, our cluster is very demanding compared with our other 
storage systems, the OSDs have a nasty memory leak, operations get 
stuck in MONs and MDSes at least once or twice a week due to race 
conditions and so on. It is currently not possible to let the cluster 
run unattended for months or even years, something that is possible 
if not the rule with other (also open-source) storage systems.

Fixing bugs that show up rarely and are very difficult to catch is 
really important for a storage system with theoretically infinite 
uptime. Rolling versions over all the time and then throwing "xyz is 
not supported, try with a newer version" at users when they discover 
a rare a problem after running for a few years is not helping to get 
ceph to a level of stability that will be convincing enough in the 
long run.

I understand that implementing new features is more fun than bug 
fixing. However, bug fixing is what makes users trust a platform. I 
see too many people around me loosing faith in ceph at the moment and 
starting to treat it as a second- or third-class storage system. This 
is largely due to the short support interval given the actual 
complexity of the software. Establishing an LTS branch could win back 
sceptical admins who started looking for alternatives.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx