Re: Why you might want packages not containers for Ceph deployments

Daniel Persson <mailto.woden@xxxxxxxxx> · Wed, 18 Aug 2021 09:28:41 +0200

Hi Everyone.

I thought I put in my 5 cents as I believe this is an exciting topic. I'm
also a newbie, only running a cluster for about a year. I did some research
before that and also have created a couple of videos on the topic. One of
them was upgrading a cluster using cephadm.

------------- ABOUT MY SETUP ------------
Currently, I manage a cluster with ten hosts and 29 OSDs which is not that
large but critical for our operations as it is the backbone of our web
application. We made a move in a hurry when we realized that the disc drive
in the machine where the application was hosted was too slow to handle all
the requests, and we also had the issue that we needed more compute power
and distribution to ensure fault tolerance. This leads to us moving all
data while buying hardware and migrating clusters in one marathon upgrade.

After that experience, I was delighted with the stability that a Ceph
solution gave us. And it has been working quite well since then. To do more
research and prepare for the future, my company bought a couple of machines
for my home, so now I have a small cluster with four hosts / four OSDs at
home to store my backup of youtube video material and also try new
technologies.
------------- ABOUT MY SETUP END ------------

Now back to the experience of using cephadm. I installed a test cluster
locally with nine hosts in a VirtualBox environment, running Debian.
Setting up cephadm was pretty straight forward and doing the upgrade was
also "easy". But I was not fond of it at all as I felt that I lost control.
I had set up a couple of machines with different hardware profiles to run
various services on each, and when I put hosts into the cluster and
deployed services, cephadm choose to put things on machines not well suited
to handle that kind of work. Also, future more running the upgrade, you got
one line of text on the current progress, so I felt I was not in control of
what happened.

Currently, I run with the built packages for Debian and use the same
operating system and packages on all machines, and upgrading a cluster is
as easy as running the apt update and apt upgrade. After reboot, that
machine is done. By doing that in the correct order, you will have complete
control. And if anything goes wrong on the way, you can manage that machine
by machine. I understand that this works well for a small cluster with less
than ten hosts, as in my case. And might not be feasible if you have a
server park with 1000 servers, but then again, controlling and managing
your cluster is a part of the work, so perhaps you don't want an automatic
solution there either.

A minor other issue is that Docker adds complexity and takes some resources
that you might want to the cluster instead. What comes to mind is a
solution running hundreds of OSDs hosts on Raspberry PI switchblades in a
rack over POE+. Also, I saw a solution to mount 16 PIs in 1U with M2 ports
for a large SSD, which could be a fun solution for a cluster.

Best regards
Daniel

On Tue, Aug 17, 2021 at 8:09 PM Andrew Walker-Brown <
andrew_jbrown@xxxxxxxxxxx> wrote:

> Hi,
>
> I’m coming at this from the position of a newbie to Ceph.  I had some
> experience of it as part of Proxmox, but not as a standalone solution.
>
> I really don’t care whether Ceph is contained or not, I don’t have the
> depth of knowledge or experience to argue it either way.  I can see that
> containers may well offer a more consistent deployment scenario with fewer
> dependencies on the external host OS.  Upgrades/patches to the host OS may
> not impact the container deployment etc., with the two systems not held in
> any lock-step.
>
> The challenge for me hasn’t been Ceph its self. Ceph has worked
> brilliantly, I have a fully resilient architecture split between two active
> datacentres and my storage can survive up-to 50% node/OSD hardware failure.
>
> No, the challenge has been documentation.  I’ve run off down multiple
> rabbit holes trying to find solutions to problems or just background
> information.  I’ve been tripped up by not spotting the Ceph documentation
> was “v: latest” rather than “v: octopus”...so features didn’t exist or
> commands were structured slightly differently.
>
> Also just not being obvious whether the bit of documentation I was looking
> at related to a native Ceph package deployment or a container one.  Plus
> you get the Ceph/Suse/Redhat/Proxmox/IBM etc..etc.. flavour answer
> depending on which Google link you click.  Yes I know, its part of the joy
> of working with open source....but still, not what you need when I chunk of
> infrastructure has failed and you don’t know why.
>
> I’m truly in awe of what the Ceph community has produced and is planning
> for the future, so don’t think I’m any kind of hater.
>
> My biggest request is for the documentation to take on some
> restructuring.  Keep the different deployment methods documented
> separately, yes an intro covering the various options and recommendations
> is great, but then keep it entirely discreet.
>
> Then when a feature/function is documented, make it clear if this applies
> to packaged or container deployment etc...  e.g. Zabbix (we use
> Zabbix)....lovely documentation on how to integrate Ceph and
> Zabbix....until you finally find out its not supported with
> containers....via a forum and an RFE/Bug entry.
>
> And thank you to all the support in the community, REALLY appreciated.
>
> Best,
>
> Andrew
>
>
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows
>
> From: Erik Lindahl<mailto:erik.lindahl@xxxxxxxxx>
> Sent: 17 August 2021 16:01
> To: Marc<mailto:Marc@xxxxxxxxxxxxxxxxx>
> Cc: Nico Schottelius<mailto:nico.schottelius@xxxxxxxxxxx>; Kai
> Börnert<mailto:kai.boernert@xxxxxxxxx>; ceph-users<mailto:
> ceph-users@xxxxxxx>
> Subject:  Re: Why you might want packages not containers for
> Ceph deployments
>
> Hi,
>
> Whether containers are good or not is a separate discussion where I
> suspect there won't be consensus in the near future.
>
> However, after just having looked at the documentation again, my main
> point would be that when a major stable open source project recommends a
> specific installation method (=cephadm) first in the "getting started"
> guide, users are going to expect that's the alternative things are
> documented for, which isn't quite the case for cephadm (yet).
>
> Most users will probably accept either solution as long as there is ONE
> clear & well-documented way of working with ceph - but the current setup of
> even having the simple (?) getting started guide talk about at least three
> different ways without clearly separating their documentation seems like a
> guarantee for long-term confusion and higher entry barriers for new users,
> which I assume is the opposite of the goal of cephadm!
>
> Cheers,
>
> Erik
>
>
> Erik Lindahl <erik.lindahl@xxxxxxxxxxxxx>
> Professor of Biophysics
> Science for Life Laboratory
> Stockholm University & KTH
> Office (SciLifeLab): +46 8 524 81567
> Cell (Sweden): +46 73 4618050
> Cell (US): +1 (650) 924 7674
>
>
>
> > On 17 Aug 2021, at 16:29, Marc <Marc@xxxxxxxxxxxxxxxxx> wrote:
> >
> > 
> >>
> >>
> >> Again, this is meant as hopefully constructive feedback rather than
> >> complaints, but the feeling a get after having had fairly smooth
> >> operations with raw packages (including fixing previous bugs leading to
> >> severe crashes) and lately grinding our teeth a bit over cephadm is that
> >> it has helped automated a bunch of stuff that wasn't particularly
> >> difficult (it's nice to issue an update with a single command, but it
> >> works perfectly fine manually too) at the cost of making it WAY more
> >> difficult to fix things (not to mention simply get information about the
> >> cluster) when we have problems - and in the long run that's not a trade-
> >> off I'm entirely happy with :-)
> >>
> >
> > Everyone can only agree to keeping things simple. I honestly do not even
> know why you want to try cephadm. The containerized solution was developed
> to replace ceph deploy, ceph ansible etc. as a solution to make ceph
> installation for new users easier. That is exactly the reason (imho) why
> you should not use the containerized environment. Because a containerized
> environment has not as primaray task being an easy deployment tool. And
> because the focus is on easy deployment, the real characteristics of the
> containerized environment are being ignored during this development. Such
> as, you must be out of your mind to create a depency between
> ceph-osd/msd/mon/all and dockerd.
> >
> > 10 years(?) ago the people of mesos thought the docker containerizer was
> 'flacky' and created their own more stable containerizer. And still today,
> containers are being killed if dockerd is terminated. What some users had
> to learn the hard way, as recently posted here.
> >
> > Today's container solutions are not on the level where you can say, you
> require absolutely no knowledge to fix issues. So that means you would
> always require knowledge of the container solution + ceph to troubleshoot.
> And that is of course more knowledge, than just knowing ceph.
> >
> > I would not be surprised if cephadm ends up like ceph deploy/ansible.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx