Hey guys... 1./ I have a simple question regarding the appearance of degraded PGs. First, for reference: a. I am working with 0.94.2 2./ I am testing the cluster in several disaster catastrophe scenarios, and I've deliberately powered down a storage server, with its 8 OSDs. At this point, everything went fine: during the night, the cluster performed all the recovery I/O, and in the morning, I got a 'HEALTH_OK' cluster running in only 3 servers and 24 OSDs. 3./ I've now powered up the missing server, and as expected, the cluster enters in 'HEALTH_WARN' and adjusts itself to the presence of one more server and 8 more populated OSDs. 4. However, what I do not understand is why during the former process, there are some PGs reported as degraded. Check the ' ceph -s' output next. As far as i understand, degraded PGs means that ceph has not replicated some objects in the placement group the correct number of times yet. This is actually not the case because, if we started from a 'HEALTH_OK situation' it means all PGs are coherent. What does it happens under the cover when this new server (and its populated 8 OSDS) rejoins the cluster that triggers the existence of degraded PGs? # ceph -s cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc health HEALTH_WARN 115 pgs backfill 121 pgs backfilling 513 pgs degraded 31 pgs recovering 309 pgs recovery_wait 513 pgs stuck degraded 576 pgs stuck unclean recovery 198838/8567132 objects degraded (2.321%) recovery 3267325/8567132 objects misplaced (38.138%) monmap e1: 3 mons at {mon1=X.X.X.X:6789/0,mon2=X.X.X.X.34:6789/0,mon3=X.X.X.X:6789/0} election epoch 24, quorum 0,1,2 mon1,mon3,mon2 mdsmap e162: 1/1/1 up {0=rccephmds=up:active}, 1 up:standby-replay osdmap e4764: 32 osds: 32 up, 32 in; 555 remapped pgs pgmap v1159567: 2176 pgs, 2 pools, 6515 GB data, 2240 kobjects 22819 GB used, 66232 GB / 89051 GB avail 198838/8567132 objects degraded (2.321%) 3267325/8567132 objects misplaced (38.138%) 1600 active+clean 292 active+recovery_wait+degraded+remapped 113 active+degraded+remapped+backfilling 60 active+degraded+remapped+wait_backfill 55 active+remapped+wait_backfill 27 active+recovering+degraded+remapped 17 active+recovery_wait+degraded 8 active+remapped+backfilling 4 active+recovering+degraded recovery io 521 MB/s, 170 objects/s Cheers Goncalo -- Goncalo Borges Research Computing ARC Centre of Excellence for Particle Physics at the Terascale School of Physics A28 | University of Sydney, NSW 2006 T: +61 2 93511937 |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com