Re: Why you might want packages not containers for Ceph deployments

Martin Verges <martin.verges@xxxxxxxx> · Wed, 17 Nov 2021 19:05:06 +0100

Hello Dave,

> The potential to lose or lose access to millions of files/objects or
petabytes of data is enough to keep you up at night.
> Many of us out here have become critically dependent on Ceph storage, and
probably most of us can barely afford our production clusters, much less a
test cluster.

Please remember, free software comes still with a price. You can not expect
someone to work on your individual problem while being cheap on your highly
critical data. If your data has value, then you should invest in ensuring
data safety. There are companies out, paying Ceph developers and fixing
bugs, so your problem will be gone as soon as you A) contribute code
yourself or B) pay someone to contribute code.

Don't get me wrong, every dev here should have the focus in providing rock
solid work and I believe they do, but in the end it's software, and
software never will be free of bugs. Ceph does quite a good job protecting
your data, and in my personal experience, if you don't do crazy stuff and
execute even crazier commands with "yes-i-really-mean-it", you usually
don't lose data.

> The real point here:  From what I'm reading in this mailing list it
appears that most non-developers are currently afraid to risk an upgrade to
Octopus or Pacific.  If this is an accurate perception then THIS IS THE
ONLY PROBLEM.

Octopus is one of the best releases ever. Often our support engineers do
upgrade old unmaintained installations from some super old release to
Octopus to get them running again or have propper tooling to fix the issue.
But I agree, we as croit are still afraid of pushing our users to Pacific,
as we encounter bugs in our tests. This however will change soon, as we are
close to a stable enough Pacific release as we believe.

--
Martin Verges
Managing director

Mobile: +49 174 9335695  | Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

On Wed, 17 Nov 2021 at 18:41, Dave Hall <kdhall@xxxxxxxxxxxxxx> wrote:

> Sorry to be a bit edgy, but...
>
> So at least 5 customers that you know of have a test cluster, or do you
> have 5 test clusters?  So 5 test clusters out of how many total Ceph
> clusters worldwide.
>
> Answers like this miss the point.  Ceph is an amazing concept.  That it is
> Open Source makes it more amazing by 10x.  But storage is big, like
> glaciers and tectonic plates.  The potential to lose or lose access to
> millions of files/objects or petabytes of data is enough to keep you up at
> night.
>
> Many of us out here have become critically dependent on Ceph storage, and
> probably most of us can barely afford our production clusters, much less a
> test cluster.
>
> The best I could do right now today for a test cluster would be 3
> Virtualbox VMs with about 10GB of disk each.  Does anybody out there think
> I could find my way past some of the more gnarly O and P issues with this
> as my test cluster?
>
> The real point here:  From what I'm reading in this mailing list it appears
> that most non-developers are currently afraid to risk an upgrade to Octopus
> or Pacific.  If this is an accurate perception then THIS IS THE ONLY
> PROBLEM.
>
> Don't shame the users who are more concerned about stability than fresh
> paint.
>
> -Dave
>
> --
> Dave Hall
> Binghamton University
> kdhall@xxxxxxxxxxxxxx
>
> On Wed, Nov 17, 2021 at 11:18 AM Stefan Kooman <stefan@xxxxxx> wrote:
>
> > On 11/17/21 16:19, Marc wrote:
> > >> The CLT is discussing a more feasible alternative to LTS, namely to
> > >> publish an RC for each point release and involve the user community to
> > >> help test it.
> > >
> > > How many users even have the availability of a 'test cluster'?
> >
> > At least 5 (one physical 3 node). We installed a few of them with the
> > exact same version as when we started prod (luminous 12.2.4 IIRC) and
> > upgraded ever since. Especially for cases where old pieces of metadata
> > might cause issues in the long run (pre jewel blows up in pacific for
> > MDS case). Same for the osd OMAP conversion troubles in pacific.
> > Especially in these cases Ceph testing on real prod might have revealed
> > that. A VM enviroment would be ideal for this. As you could just
> > snapshot state and play back when needed. Ideally MDS / RGW / RBD
> > workloads on them to make sure all use cases are tested.
> >
> > But these cluster have not the same load as prod. Not the same data ...
> > so still stuff might break in special ways. But at least we try to avoid
> > that as much as possible.
> >
> > Gr. Stefan
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx