Re: Why you might want packages not containers for Ceph deployments

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

We're feeling quite the same. We had our first Ceph outage ever last week after using Ceph since Firefly. We manage 8 tiny ceph systems in production and have stopped upgrading to Pacific for quite a while after 2 cluster upgrades.

The outage started with the reboot of two mon nodes due to Proxmox cluster issue (no hardware/disk issue otherwise). Nodes were back after 2 minutes, but Ceph cluster went nuts and it stalled like 7 hours after that because 2 mons went out of free space (there was massive network traffic in-between, not clear why). Mons are on root partition (SSD, 15GB total) and we had to increase that to more than 100GB to allow Ceph to recover... now only 8-9GB are used (for system + mon). It was quite insane, and this is in a tiny 15 OSD total 4-node cluster... Cluster specs have been those since Firefly (4 HDD OSDs were removed and SSD OSDs installed). We also had to restart a mgr daemon that was eating 45GB of RAM (memory leak I guess)...

I have the feeling that we read about this kind of massive surges in used resources in this list quite often. We (users/admins) need a more predictable resource usage; there seem to be too many corner cases and bugs eating RAM/disk .

I'd have expected Ceph to resync mons in some seconds and maybe perform some backfill between OSDs, as that has been our rock-solid experience until last week...

Just wanted to make a concrete example of what users are experiencing in the field.

Thanks


El 8/11/21 a las 17:59, Francois Legrand escribió:
Hi Franck,

I totally agree with your point 3 (also with 1 and 2 indeed). Generally speaking, the release cycle of many softwares tends to become faster and faster (not only for ceph, but also openstack etc...) and it's really hard and tricky to maintain an infrastructure up to date in such conditions, even more when you deal with storage. As a result, as you perfectly explained it, this gives the impression that the product is not that robust, contains a lot of bugs and needs a lot of patches etc. Few times upgrades had been released with obvious bugs or regressions (e.g DNS problem in 14.2.12,...) and this gives the impression that there is an urge to release, even if the corrections are not totally tested... which lead to a loose of confidence from the users.

And I am personally in this process !! We wanted to upgrade our Nautilus cluster. First we decided to go directly to Pacific, but looking to the list it appears to us that Pacific is absolutely not stable enough to be considered as a production release. We thus decided to go to octopus... maybe we will go to pacific when the v17 will be out.

I thus feel that the "last stable release" (currently pacific) is in fact a development release (and the community is the "testing pool" for that release) and the truly stable release is the n-1 one (octopus). Thus I am fully supporting your request for a LTS release with stability as a main goal.

F.



Le 08/11/2021 à 13:21, Frank Schilder a écrit :
Hi all,

I followed this thread with great interest and would like to add my opinion/experience/wishes as well.

I believe the question packages versus containers needs a bit more context to be really meaningful. This was already mentioned several times with regards to documentation. I see the following three topics tightly connected (my opinion/answers included):

1. Distribution: Packages are compulsory, containers are optional.
2. Deployment: Ceph adm (yet another deployment framework) and ceph (the actual storage system) should be strictly different projects. 3. Release cycles: The release cadence is way too fast, I very much miss a ceph LTS branch with at least 10 years back-port support.

These are my short answers/wishes/expectations in this context. I will add below some more reasoning as optional reading (warning: wall of text ahead).


1. Distribution
---------

I don't think the question is about packages versus containers, because even if a distribution should decide not to package ceph any more, other distributors certainly will and the user community just moves away from distributions without ceph packages. In addition, unless Rad Hat plans to move to a source-only container where I run the good old configure - make - make install, it will be package based any ways, so packages are there to stay.

Therefore, the way I understand this question is about ceph-adm versus other deployment methods. Here, I think the push to a container-based ceph-adm only deployment is unlikely to become the no. 1 choice for everyone for good reasons already mentioned in earlier messages. In addition, I also believe that development of a general deployment tool is currently not sustainable as was mentioned by another user. My reasons for this are given in the next section.


2. Deployment
---------

In my opinion, it is really important to distinguish three components of any open-source project: development (release cycles), distribution and deployment. Following the good old philosophy that every tool does exactly one job and does it well, each of these components are separate projects, because they correspond to different tools.

This implies immediately that ceph documentation should not contain documentation about packaging and deployment tools. Each of these ought to be strictly separate. If I have a low-level problem with ceph and go to the ceph documentation, I do not want to see ceph-adm commands. Ceph documentation should be about ceph (the storage system) only. Such a mix-up is leading to problems and there were already ceph-user cases where people could not use the documentation for trouble shooting, because it showed ceph-adm commands but their cluster was not ceph-adm deployed.

In this context, I would prefer if there was a separate ceph-adm-users list so that ceph-users can focus on actual ceph problems again.

Now to the point that ceph-adm might be an un-sustainable project. Although at a first glance the idea of a generic deployment tool that solves all problems with a single command might look appealing, it is likely doomed to fail for a simple reason that was already indicated in an earlier message: ceph deployment is subject to a complexity paradox. Ceph has a very large configuration space and implementing and using a generic tool that covers and understands this configuration space is more complex than deploying any specific ceph cluster, each of which uses only a tiny subset of the entire configuration space.

In other words: deploying a specific ceph cluster is actually not that difficult.

Designing a - and dimensioning all components of a ceph cluster is difficult and none of the current deployment tools help here. There is not even a check for suitable hardware. In addition, technology is moving fast and adapting a generic tool to new developments in time seems a hopeless task. For example, when will ceph-adm natively support collocated lvm OSDs with dm_cache devices? Is it even worth trying to incorporate this?

My wish would be to keep the ceph project clean of any deployment tasks. In my opinion, the basic ceph tooling is already doing tasks that are the responsibility of a configuration management- and not a storage system (e.g. deploy unit files by default instead of as an option disabled by default).


3. Release cycles
---------

Ceph is a complex system and the code is getting more complex every day. It is very difficult to beat the curse of complexity that development and maintenance effort grows non-linearly (exponentially?) with the number of lines of code. As a consequence, (A) if one wants to maintain quality while adding substantial new features, the release intervals become longer and longer. (B) If one wants to maintain constant release intervals while adding substantial new features, the quality will have to go down. The last option is that (C) new releases with constant release intervals contain ever smaller increments in functionality to maintain quality. I ignore the option of throwing more and more qualified developers at the project as this seems unlikely and also comes with its own complexity cost.

I'm afraid we are in scenario B. Ceph is loosing its nimbus of being a rock solid system.

Just recently, there were some ceph-user emails about how dangerous or not is it to upgrade to the latest stable octopus version. The upgrade itself apparently goes well, but what happens then? I personally have too many reports that the latest ceph versions are quite touchy and collapse in situations that have never been a problem up to mimic (most prominently, that a simple rebalance operation after adding disks gets OSDs to flap and can take a whole cluster down - plenty of cases since nautilus). Stability at scale seems to become a real issue with increasing version numbers. I'm myself very hesitant to upgrade, in particular, because there is no way back and the cycles of potential doom are so short.

Therefore, I would very much appreciate the foundation of a ceph-LTS branch with at least 10 years back-port support, if not longer. In addition, upgrade procedures between LTS versions should allow a downgrade by one version as well (move legacy data along until explicitly allowed to cut all bridges). For any large storage system, robustness, predictability and low maintenance effort are invaluable. For example, our cluster is very demanding compared with our other storage systems, the OSDs have a nasty memory leak, operations get stuck in MONs and MDSes at least once or twice a week due to race conditions and so on. It is currently not possible to let the cluster run unattended for months or even years, something that is possible if not the rule with other (also open-source) storage systems.

Fixing bugs that show up rarely and are very difficult to catch is really important for a storage system with theoretically infinite uptime. Rolling versions over all the time and then throwing "xyz is not supported, try with a newer version" at users when they discover a rare a problem after running for a few years is not helping to get ceph to a level of stability that will be convincing enough in the long run.

I understand that implementing new features is more fun than bug fixing. However, bug fixing is what makes users trust a platform. I see too many people around me loosing faith in ceph at the moment and starting to treat it as a second- or third-class storage system. This is largely due to the short support interval given the actual complexity of the software. Establishing an LTS branch could win back sceptical admins who started looking for alternatives.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux