Re: Cluster degraded after Ceph Upgrade 12.2.1 => 12.2.2

Ranjan Ghosh <ghosh@xxxxxx> · Thu, 26 Apr 2018 13:00:35 +0200

HI Ronny,

Thanks for the detailed answer. It's much appreciated! I will keep this 
in the back of my mind, but for now the cost is prohibitive as we're 
using these servers not as storage-only space but full-fledged servers 
(i.e. Ceph is mounted locally, there's a webserver and database). And 2 
servers can be connected with a cross-link cable. 3 servers would 
require a switch and so on. It adds up quite quickly if you are really 
on a tight budget. Sometimes it's not so easy to advocate for new 
hardware as the benefits are not apparent to everyone :-)

 In addition, a reason why we're using Ceph in the first place is that 
we can do easy maintenance and the other server keeps running and the 
other catches up as soon as it comes back online. With 2/2 we'd lose 
exactly that - so it's a no-go. Of course, if the second node goes down 
as well we have a problem but OTOH: Any new changes wont happen as no 
writes will then happen anyway. And in addition both servers are 
equipped with Hardware RAID and BBU. In combination with our solid 
backup, I'm currently willing to take any risks. If we grow further, we 
might want to look at the 3/2 solution, though. Thanks again for letting 
me know about the underlying reasons!

Best regards,

Ranjan

Am 25.04.2018 um 19:40 schrieb Ronny Aasen:
the difference in cost between 2 and 3 servers are not HUGE. but the 
reliability  difference between a size 2/1 pool and a 3/2 pool is 
massive. a 2/1 pool is just a single fault during maintenance away 
from dataloss.  but you need multiple simultaneous faults, and have 
very bad luck to break a 3/2 pool

I would recommend rather using 2/2 pools if you are willing to accept 
a little downtime when a disk dies.  the cluster io would stop until 
the disks backfill to cover for the lost disk.
but it is better then having inconsistent pg's or dataloss because a 
disk crashed during a routine reboot, or 2 disks

also worth to read this link 
https://www.spinics.net/lists/ceph-users/msg32895.html  ; a good 
explanation.

you have good backups and are willing to restore the whole pool. And 
it is of course your privilege to run 2/1 pools but be mind full of 
the risks of doing so.

kind regards
Ronny Aasen

BTW: i did not know ubuntu automagically rebooted after a upgrade. you 
can probably avoid that reboot somehow in ubuntu. and do the restarts 
of services manually. if you wish to maintain service during upgrade

On 25.04.2018 11:52, Ranjan Ghosh wrote:
Thanks a lot for your detailed answer. The problem for us, however, 
was that we use the Ceph packages that come with the Ubuntu 
distribution. If you do a Ubuntu upgrade, all packages are upgraded 
in one go and the server is rebooted. You cannot influence anything 
or start/stop services one-by-one etc. This was concering me, because 
the upgrade instructions didn't mention anything about an alternative 
or what to do in this case. But someone here enlightened me that - in 
general - it all doesnt matter that much *if you are just accepting a 
downtime*. And, indeed, it all worked nicely. We stopped all services 
on all servers, upgraded the Ubuntu version, rebooted all servers and 
were ready to go again. Didn't encounter any problems there. The only 
problem turned out to be our own fault and simply a firewall 
misconfiguration.

And, yes, we're running a "size:2 min_size:1" because we're on a very 
tight budget. If I understand correctly, this means: Make changes of 
files to one server. *Eventually* copy them to the other server. I 
hope this *eventually* means after a few minutes. Up until now I've 
never experienced *any* problems with file integrity with this 
configuration. In fact, Ceph is incredibly stable. Amazing. I have 
never ever had any issues whatsoever with broken files/partially 
written files, files that contain garbage etc. Even after 
starting/stopping services, rebooting etc. With GlusterFS and other 
Cluster file system I've experienced many such problems over the 
years, so this is what makes Ceph so great. I have now a lot of trust 
in Ceph, that it will eventually repair everything :-) And: If a file 
that has been written a few seconds ago is really lost it wouldnt be 
that bad for our use-case. It's a web-server. Most important stuff is 
in the DB. We have hourly backups of everything. In a huge emergency, 
we could even restore the backup from an hour ago if we really had 
to. Not nice, but if it happens every 6 years or sth due to some 
freak hardware failure, I think it is manageable. I accept it's not 
the recommended/perfect solution if you have infinite amounts of 
money at your hands, but in our case, I think it's not extremely 
audacious either to do it like this, right?

Am 11.04.2018 um 19:25 schrieb Ronny Aasen:
ceph upgrades are usualy not a problem:
ceph have to be upgraded in the right order. normally when each 
service is on its own machine this is not difficult.
but when you have mon, mgr, osd, mds, and klients on the same host 
you have to do it a bit carefully..

i tend to have a terminal open with "watch ceph -s" running, and i 
never do another service until the health is ok again.

first apt upgrade the packages on all the hosts. This only update 
the software on disk and not the running services.
then do the restart of services in the right order.  and only on one 
host at the time

mons: first you restart the mon service on all mon running hosts.
all the 3 mons are active at the same time, so there is no "shifting 
around" but make sure the quorum is ok again before you do the next 
mon.

mgr: then restart mgr on all hosts that run mgr. there is only one 
active mgr at the time now, so here there will be a bit of shifting 
around. but it is only for statistics/management so it may affect 
your ceph -s command, but not the cluster operation.

osd: restart osd processes one osd at the time, make sure health_ok 
before doing the next osd process. do this for all hosts that have 
osd's

mds: restart mds's one at the time. you will notice the standby mds 
taking over for the mds that was restarted. do both.

klients: restart clients, that means remount filesystems, migrate or 
restart vm's. or restart whatever process uses the old ceph libraries.

about pools:
since you only have 2 osd's you can obviously not be running the 
recommended 3 replication pools. ? this makes me worry that you may 
be running size:2 min_size:1 pools. and are daily running risk of 
dataloss due to corruption and inconsistencies. especially when you 
restart osd's

if your pools are size:2 min_size:2 then your cluster will fail when 
any osd is restarted, until the osd is up and healthy again. but you 
have less chance for dataloss then 2/1 pools.

if you added a osd on a third host you can run size:3 min_size:2 . 
the recommended config when you can have both redundancy and high 
availabillity.

kind regards
Ronny Aasen

On 11.04.2018 17:42, Ranjan Ghosh wrote:
Ah, nevermind, we've solved it. It was a firewall issue. The only 
thing that's weird is that it became an issue immediately after an 
update. Perhaps it has sth. to do with monitor nodes shifting 
around or anything. Well, thanks again for your quick support, 
though. It's much appreciated.

BR

Ranjan

Am 11.04.2018 um 17:07 schrieb Ranjan Ghosh:
Thank you for your answer. Do you have any specifics on which 
thread you're talking about? Would be very interested to read 
about a success story, because I fear that if I update the other 
node that the whole cluster comes down.

Am 11.04.2018 um 10:47 schrieb Marc Roos:
I think you have to update all osd's, mon's etc. I can remember 
running
into similar issue. You should be able to find more about this in
mailing list archive.

-----Original Message-----
From: Ranjan Ghosh [mailto:ghosh@xxxxxx]
Sent: woensdag 11 april 2018 16:02
To: ceph-users
Subject:  Cluster degraded after Ceph Upgrade 12.2.1 =>
12.2.2

Hi all,

We have a two-cluster-node (with a third "monitoring-only" node). 
Over
the last months, everything ran *perfectly* smooth. Today, I did an
Ubuntu "apt-get upgrade" on one of the two servers. Among others, 
the
ceph packages were upgraded from 12.2.1 to 12.2.2. A minor release
update, one might think. But, to my surprise, after restarting the
services, Ceph is now in degraded state :-( (see below). Only the 
first
node - which ist still on 12.2.1 - seems to be running. I did a 
bit of
research and found this:

https://ceph.com/community/new-luminous-pg-overdose-protection/

I did set "mon_max_pg_per_osd = 300" to no avail. Don't know if 
this is
the problem at all.

Looking at the status it seems we have 264 pgs, right? When I enter
"ceph osd df" (which I found on another website claiming it 
should print
the number of PGs per OSD), it just hangs (need to abort with 
Ctrl+C).

Hope anybody can help me. The cluster know works with the single 
node,
but it is definively quite worrying because we don't have 
redundancy.

Thanks in advance,

Ranjan

root@tukan2 /var/www/projects # ceph -s
    cluster:
      id:     19895e72-4a0c-4d5d-ae23-7f631ec8c8e4
      health: HEALTH_WARN
              insufficient standby MDS daemons available
              Reduced data availability: 264 pgs inactive
              Degraded data redundancy: 264 pgs unclean

    services:
      mon: 3 daemons, quorum tukan1,tukan2,tukan0
      mgr: tukan0(active), standbys: tukan2
      mds: cephfs-1/1/1 up  {0=tukan2=up:active}
      osd: 2 osds: 2 up, 2 in

    data:
      pools:   3 pools, 264 pgs
      objects: 0 objects, 0 bytes
      usage:   0 kB used, 0 kB / 0 kB avail
      pgs:     100.000% pgs unknown

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com