Hi,
We're feeling quite the same. We had our first Ceph outage ever last
week after using Ceph since Firefly. We manage 8 tiny ceph systems in
production and have stopped upgrading to Pacific for quite a while after
2 cluster upgrades.
The outage started with the reboot of two mon nodes due to Proxmox
cluster issue (no hardware/disk issue otherwise). Nodes were back after
2 minutes, but Ceph cluster went nuts and it stalled like 7 hours after
that because 2 mons went out of free space (there was massive network
traffic in-between, not clear why). Mons are on root partition (SSD,
15GB total) and we had to increase that to more than 100GB to allow Ceph
to recover... now only 8-9GB are used (for system + mon). It was quite
insane, and this is in a tiny 15 OSD total 4-node cluster... Cluster
specs have been those since Firefly (4 HDD OSDs were removed and SSD
OSDs installed). We also had to restart a mgr daemon that was eating
45GB of RAM (memory leak I guess)...
I have the feeling that we read about this kind of massive surges in
used resources in this list quite often. We (users/admins) need a more
predictable resource usage; there seem to be too many corner cases and
bugs eating RAM/disk .
I'd have expected Ceph to resync mons in some seconds and maybe perform
some backfill between OSDs, as that has been our rock-solid experience
until last week...
Just wanted to make a concrete example of what users are experiencing in
the field.
Thanks
El 8/11/21 a las 17:59, Francois Legrand escribió:
Hi Franck,
I totally agree with your point 3 (also with 1 and 2 indeed).
Generally speaking, the release cycle of many softwares tends to
become faster and faster (not only for ceph, but also openstack
etc...) and it's really hard and tricky to maintain an infrastructure
up to date in such conditions, even more when you deal with storage.
As a result, as you perfectly explained it, this gives the impression
that the product is not that robust, contains a lot of bugs and needs
a lot of patches etc. Few times upgrades had been released with
obvious bugs or regressions (e.g DNS problem in 14.2.12,...) and this
gives the impression that there is an urge to release, even if the
corrections are not totally tested... which lead to a loose of
confidence from the users.
And I am personally in this process !! We wanted to upgrade our
Nautilus cluster. First we decided to go directly to Pacific, but
looking to the list it appears to us that Pacific is absolutely not
stable enough to be considered as a production release. We thus
decided to go to octopus... maybe we will go to pacific when the v17
will be out.
I thus feel that the "last stable release" (currently pacific) is in
fact a development release (and the community is the "testing pool"
for that release) and the truly stable release is the n-1 one
(octopus). Thus I am fully supporting your request for a LTS release
with stability as a main goal.
F.
Le 08/11/2021 à 13:21, Frank Schilder a écrit :
Hi all,
I followed this thread with great interest and would like to add my
opinion/experience/wishes as well.
I believe the question packages versus containers needs a bit more
context to be really meaningful. This was already mentioned several
times with regards to documentation. I see the following three topics
tightly connected (my opinion/answers included):
1. Distribution: Packages are compulsory, containers are optional.
2. Deployment: Ceph adm (yet another deployment framework) and ceph
(the actual storage system) should be strictly different projects.
3. Release cycles: The release cadence is way too fast, I very much
miss a ceph LTS branch with at least 10 years back-port support.
These are my short answers/wishes/expectations in this context. I
will add below some more reasoning as optional reading (warning: wall
of text ahead).
1. Distribution
---------
I don't think the question is about packages versus containers,
because even if a distribution should decide not to package ceph any
more, other distributors certainly will and the user community just
moves away from distributions without ceph packages. In addition,
unless Rad Hat plans to move to a source-only container where I run
the good old configure - make - make install, it will be package
based any ways, so packages are there to stay.
Therefore, the way I understand this question is about ceph-adm
versus other deployment methods. Here, I think the push to a
container-based ceph-adm only deployment is unlikely to become the
no. 1 choice for everyone for good reasons already mentioned in
earlier messages. In addition, I also believe that development of a
general deployment tool is currently not sustainable as was mentioned
by another user. My reasons for this are given in the next section.
2. Deployment
---------
In my opinion, it is really important to distinguish three components
of any open-source project: development (release cycles),
distribution and deployment. Following the good old philosophy that
every tool does exactly one job and does it well, each of these
components are separate projects, because they correspond to
different tools.
This implies immediately that ceph documentation should not contain
documentation about packaging and deployment tools. Each of these
ought to be strictly separate. If I have a low-level problem with
ceph and go to the ceph documentation, I do not want to see ceph-adm
commands. Ceph documentation should be about ceph (the storage
system) only. Such a mix-up is leading to problems and there were
already ceph-user cases where people could not use the documentation
for trouble shooting, because it showed ceph-adm commands but their
cluster was not ceph-adm deployed.
In this context, I would prefer if there was a separate
ceph-adm-users list so that ceph-users can focus on actual ceph
problems again.
Now to the point that ceph-adm might be an un-sustainable project.
Although at a first glance the idea of a generic deployment tool that
solves all problems with a single command might look appealing, it is
likely doomed to fail for a simple reason that was already indicated
in an earlier message: ceph deployment is subject to a complexity
paradox. Ceph has a very large configuration space and implementing
and using a generic tool that covers and understands this
configuration space is more complex than deploying any specific ceph
cluster, each of which uses only a tiny subset of the entire
configuration space.
In other words: deploying a specific ceph cluster is actually not
that difficult.
Designing a - and dimensioning all components of a ceph cluster is
difficult and none of the current deployment tools help here. There
is not even a check for suitable hardware. In addition, technology is
moving fast and adapting a generic tool to new developments in time
seems a hopeless task. For example, when will ceph-adm natively
support collocated lvm OSDs with dm_cache devices? Is it even worth
trying to incorporate this?
My wish would be to keep the ceph project clean of any deployment
tasks. In my opinion, the basic ceph tooling is already doing tasks
that are the responsibility of a configuration management- and not a
storage system (e.g. deploy unit files by default instead of as an
option disabled by default).
3. Release cycles
---------
Ceph is a complex system and the code is getting more complex every
day. It is very difficult to beat the curse of complexity that
development and maintenance effort grows non-linearly
(exponentially?) with the number of lines of code. As a consequence,
(A) if one wants to maintain quality while adding substantial new
features, the release intervals become longer and longer. (B) If one
wants to maintain constant release intervals while adding substantial
new features, the quality will have to go down. The last option is
that (C) new releases with constant release intervals contain ever
smaller increments in functionality to maintain quality. I ignore the
option of throwing more and more qualified developers at the project
as this seems unlikely and also comes with its own complexity cost.
I'm afraid we are in scenario B. Ceph is loosing its nimbus of being
a rock solid system.
Just recently, there were some ceph-user emails about how dangerous
or not is it to upgrade to the latest stable octopus version. The
upgrade itself apparently goes well, but what happens then? I
personally have too many reports that the latest ceph versions are
quite touchy and collapse in situations that have never been a
problem up to mimic (most prominently, that a simple rebalance
operation after adding disks gets OSDs to flap and can take a whole
cluster down - plenty of cases since nautilus). Stability at scale
seems to become a real issue with increasing version numbers. I'm
myself very hesitant to upgrade, in particular, because there is no
way back and the cycles of potential doom are so short.
Therefore, I would very much appreciate the foundation of a ceph-LTS
branch with at least 10 years back-port support, if not longer. In
addition, upgrade procedures between LTS versions should allow a
downgrade by one version as well (move legacy data along until
explicitly allowed to cut all bridges). For any large storage system,
robustness, predictability and low maintenance effort are invaluable.
For example, our cluster is very demanding compared with our other
storage systems, the OSDs have a nasty memory leak, operations get
stuck in MONs and MDSes at least once or twice a week due to race
conditions and so on. It is currently not possible to let the cluster
run unattended for months or even years, something that is possible
if not the rule with other (also open-source) storage systems.
Fixing bugs that show up rarely and are very difficult to catch is
really important for a storage system with theoretically infinite
uptime. Rolling versions over all the time and then throwing "xyz is
not supported, try with a newer version" at users when they discover
a rare a problem after running for a few years is not helping to get
ceph to a level of stability that will be convincing enough in the
long run.
I understand that implementing new features is more fun than bug
fixing. However, bug fixing is what makes users trust a platform. I
see too many people around me loosing faith in ceph at the moment and
starting to treat it as a second- or third-class storage system. This
is largely due to the short support interval given the actual
complexity of the software. Establishing an LTS branch could win back
sceptical admins who started looking for alternatives.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project
Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx