Hey Greg...
Thanks for the reply.
At this point the cluster recovered, so I am no longer in that
situation. I'll try to go back, reproduce and post the pg query for one
of those degraded PGs later on.
Cheers
Goncalo
On 08/27/2015 10:02 PM, Gregory Farnum wrote:
On Thu, Aug 27, 2015 at 2:54 AM, Goncalo Borges
<goncalo@xxxxxxxxxxxxxxxxxxx> wrote:
Hey guys...
1./ I have a simple question regarding the appearance of degraded PGs.
First, for reference:
a. I am working with 0.94.2
b. I have 32 OSDs distributed in 4 servers, meaning that I have 8 OSD per
server.
c. Our cluster is set with 'osd pool default size = 3' and 'osd pool default
min size = 2'
2./ I am testing the cluster in several disaster catastrophe scenarios, and
I've deliberately powered down a storage server, with its 8 OSDs. At this
point, everything went fine: during the night, the cluster performed all the
recovery I/O, and in the morning, I got a 'HEALTH_OK' cluster running in
only 3 servers and 24 OSDs.
3./ I've now powered up the missing server, and as expected, the cluster
enters in 'HEALTH_WARN' and adjusts itself to the presence of one more
server and 8 more populated OSDs.
4. However, what I do not understand is why during the former process, there
are some PGs reported as degraded. Check the ' ceph -s' output next. As far
as i understand, degraded PGs means that ceph has not replicated some
objects in the placement group the correct number of times yet. This is
actually not the case because, if we started from a 'HEALTH_OK situation' it
means all PGs are coherent. What does it happens under the cover when this
new server (and its populated 8 OSDS) rejoins the cluster that triggers the
existence of degraded PGs?
Hmm. I too would expect those PGs to be reporting as "remapped" rather
than "degraded". And indeed they are all remapped in addition to being
degraded. Can you get the pg query for one of these degraded PGs and
post it to the list? Sam, do you expect this behavior?
-Greg
# ceph -s
cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc
health HEALTH_WARN
115 pgs backfill
121 pgs backfilling
513 pgs degraded
31 pgs recovering
309 pgs recovery_wait
513 pgs stuck degraded
576 pgs stuck unclean
recovery 198838/8567132 objects degraded (2.321%)
recovery 3267325/8567132 objects misplaced (38.138%)
monmap e1: 3 mons at
{mon1=X.X.X.X:6789/0,mon2=X.X.X.X.34:6789/0,mon3=X.X.X.X:6789/0}
election epoch 24, quorum 0,1,2 mon1,mon3,mon2
mdsmap e162: 1/1/1 up {0=rccephmds=up:active}, 1 up:standby-replay
osdmap e4764: 32 osds: 32 up, 32 in; 555 remapped pgs
pgmap v1159567: 2176 pgs, 2 pools, 6515 GB data, 2240 kobjects
22819 GB used, 66232 GB / 89051 GB avail
198838/8567132 objects degraded (2.321%)
3267325/8567132 objects misplaced (38.138%)
1600 active+clean
292 active+recovery_wait+degraded+remapped
113 active+degraded+remapped+backfilling
60 active+degraded+remapped+wait_backfill
55 active+remapped+wait_backfill
27 active+recovering+degraded+remapped
17 active+recovery_wait+degraded
8 active+remapped+backfilling
4 active+recovering+degraded
recovery io 521 MB/s, 170 objects/s
Cheers
Goncalo
--
Goncalo Borges
Research Computing
ARC Centre of Excellence for Particle Physics at the Terascale
School of Physics A28 | University of Sydney, NSW 2006
T: +61 2 93511937
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Goncalo Borges
Research Computing
ARC Centre of Excellence for Particle Physics at the Terascale
School of Physics A28 | University of Sydney, NSW 2006
T: +61 2 93511937
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com