Re: 2 osd failures

Shain Miley <smiley@xxxxxxx> · Wed, 7 Sep 2016 08:38:24 -0400

Well not entirely too late I guess :-(

I woke up this morning to see that two OTHER osd's had been marked down 
and out.

I again restarted the osd daemons and things seem to be ok at this point.

I agree that I need to get to the bottom on why this happened.

I have uploaded the log files from 1 of the downed osd's here:

http://filebin.ca/2uFoRw017TCD/ceph-osd.51.log.1
http://filebin.ca/2uFosTO8oHmj/ceph-osd.51.log

You can see my osd restart at about 6:15 am this morning....other than 
that I don't see anything indicated in the log files (although I could 
be missing it for sure).

Just an FYI we are currently running ceph version 0.94.9..which I 
upgraded to at the end of last week (from 0.94.6 I think)

This cluster is about 2 or 3 years old at this point and we have not run 
into this issue at all up to this point.

Thanks,

Shain

On 09/07/2016 12:00 AM, Christian Balzer wrote:
Hello,

Too late I see, but still...

On Tue, 6 Sep 2016 22:17:05 -0400 Shain Miley wrote:

Hello,

It looks like we had 2 osd's fail at some point earlier today, here is
the current status of the cluster:

You will really want to find out how and why that happened, because while
not impossible this is pretty improbable.

Something like HW, are the OSDs on the same host, or maybe an OOM event,
etc.

root@rbd1:~# ceph -s
      cluster 504b5794-34bd-44e7-a8c3-0494cf800c23
       health HEALTH_WARN
              2 pgs backfill
              5 pgs backfill_toofull
Bad, you will want your OSDs back in and then some.
Have a look at "ceph osd df".

              69 pgs backfilling
              74 pgs degraded
              1 pgs down
              1 pgs peering
Not good either.
W/o bringing back your OSDs that means doom for the data on those PGs.

              74 pgs stuck degraded
              1 pgs stuck inactive
              75 pgs stuck unclean
              74 pgs stuck undersized
              74 pgs undersized
              recovery 1903019/105270534 objects degraded (1.808%)
              recovery 1120305/105270534 objects misplaced (1.064%)
              crush map has legacy tunables
       monmap e1: 3 mons at
{hqceph1=10.35.1.201:6789/0,hqceph2=10.35.1.203:6789/0,hqceph3=10.35.1.205:6789/0}
              election epoch 282, quorum 0,1,2 hqceph1,hqceph2,hqceph3
       osdmap e25019: 108 osds: 105 up, 105 in; 74 remapped pgs
        pgmap v30721368: 3976 pgs, 17 pools, 144 TB data, 51401 kobjects
              285 TB used, 97367 GB / 380 TB avail
              1903019/105270534 objects degraded (1.808%)
              1120305/105270534 objects misplaced (1.064%)
                  3893 active+clean
                    69 active+undersized+degraded+remapped+backfilling
                     6 active+clean+scrubbing
                     3 active+undersized+degraded+remapped+backfill_toofull
                     2 active+clean+scrubbing+deep
When in recovery/backfill situations, you always want to stop any and all
scrubbing.

                     2
active+undersized+degraded+remapped+wait_backfill+backfill_toofull
                     1 down+peering
recovery io 248 MB/s, 84 objects/s

We had been running for a while with 107 osd's (not 108), it looks like
osd's 64 and 76 are both now down and out at this point.

I have looked though the ceph logs for each osd and did not see anything
obvious, the raid controller also does not show the disk offline.

Get to the bottom of that, normally something gets logged when an OSD
fails.

I am wondering if I should try to restart the two osd's that are showing
as down...or should I wait until the current recovery is complete?

As said, try to restart immediately, just to keep the traffic down for
starters.

The pool has a replica level of  '2'...and with 2 failed disks I want to
do whatever I can to make sure there is not an issue with missing objects.

I sure hope that pool holds backups or something of that nature.

The only times when a replica of 2 isn't a cry for Murphy to smite you is
with RAID backed OSDs or VERY well monitored and vetted SSDs.

Thanks in advance,

Shain

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
NPR | Shain Miley | Manager of Infrastructure, Digital Media | smiley@xxxxxxx | 202.513.3649

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com