Ceph crash, how to analyse and recover

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello Ceph Users,

We have a Ceph test cluster, that we want to bring into production and will grow rapidly in the future.
Ceph version:
ceph                                   0.80.7-2+deb8u1             amd64        distributed storage and file system
ceph-common                    0.80.7-2+deb8u1             amd64        common utilities to mount and interact with a ceph storage cluster


Our config:
5 hosts with each running 12 OSDs
containing 2 objects
One node went down and stayed down for about 12 hours
Then it was brought back online (manually), the entire cluster slowly
came to a halt with the current status being:

First status after this crash:

cluster e2295d66-a265-11e5-8c92-00219bfd424c
      health HEALTH_WARN 4628 pgs down; 4628 pgs peering; 4628 pgs stuck
inactive; 4628 pgs stuck unclean
      monmap e3: 3 mons at
{a=172.30.0.2:6789/0,b=172.30.0.67:6789/0,mon=172.30.0.1:6789/0},
election epoch 16, quorum 0,1,2 mon,a,b
      osdmap e18880: 60 osds: 48 up, 48 in
       pgmap v127495: 4628 pgs, 4 pools, 1238 bytes data, 4 objects
             283 GB used, 130 TB / 130 TB avail
                 4628 down+peering
 
The Ceph status at this moment:
# ceph status
    cluster e2295d66-a265-11e5-8c92-00219bfd424c
     health HEALTH_WARN 4622 pgs down; 4628 pgs peering; 1427 pgs stale; 4628 pgs stuck inactive; 1427 pgs stuck stale; 4628 pgs stuck unclean; 2/17 in osds are down; 1 mons down, quorum 1,2 a,b
     monmap e3: 3 mons at {a=172.30.0.2:6789/0,b=172.30.0.67:6789/0,mon=172.30.0.1:6789/0}, election epoch 18, quorum 1,2 a,b
     osdmap e19242: 60 osds: 15 up, 17 in
      pgmap v128135: 4628 pgs, 4 pools, 118 bytes data, 3 objects
            100 GB used, 47383 GB / 47483 GB avail
                   3 peering
                1424 stale+down+peering
                3198 down+peering
                   3 stale+peering

   

It is a test cluster, so no real harm done. How to get it back up, and
why did this happen?


Regards, Arnoud.

De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onterecht ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender direct te informeren door het bericht te retourneren. Het Universitair Medisch Centrum Utrecht is een publiekrechtelijke rechtspersoon in de zin van de W.H.W. (Wet Hoger Onderwijs en Wetenschappelijk Onderzoek) en staat geregistreerd bij de Kamer van Koophandel voor Midden-Nederland onder nr. 30244197.

Denk s.v.p aan het milieu voor u deze e-mail afdrukt.


This message may contain confidential information and is intended exclusively for the addressee. If you receive this message unintentionally, please do not use the contents but notify the sender immediately by return e-mail. University Medical Center Utrecht is a legal person by public law and is registered at the Chamber of Commerce for Midden-Nederland under no. 30244197.

Please consider the environment before printing this e-mail.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux