Hello, I am playing around with ceph (ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)) on Debian Jessie and I build a test setup:$ ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0.01497 root default -2 0.00499 host af-staging-ceph01 0 0.00499 osd.0 up 1.00000 1.00000 -3 0.00499 host af-staging-ceph02 1 0.00499 osd.1 up 1.00000 1.00000 -4 0.00499 host af-staging-ceph03 2 0.00499 osd.2 up 1.00000 1.00000 So I have 3 osd on 3 servers. I also created 2 pools: ceph osd dump | grep 'replicated size' pool 1 'cephfs_data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 33 flags hashpspool crash_replay_interval 45 stripe_width 0 pool 2 'cephfs_metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 31 flags hashpspool stripe_width 0 Now I am testing failover and kill one of servers: ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0.01497 root default -2 0.00499 host af-staging-ceph01 0 0.00499 osd.0 up 1.00000 1.00000 -3 0.00499 host af-staging-ceph02 1 0.00499 osd.1 down 1.00000 1.00000 -4 0.00499 host af-staging-ceph03 2 0.00499 osd.2 up 1.00000 1.00000 And now it stuck in the recovery state: ceph -s cluster 6b5ff07a-7232-4840-b486-6b7906248de7 health HEALTH_WARN 64 pgs degraded 18 pgs stuck unclean 64 pgs undersized recovery 21/63 objects degraded (33.333%) 1/3 in osds are down 1 mons down, quorum 0,2 af-staging-ceph01,af-staging-ceph03 monmap e1: 3 mons at {af-staging-ceph01=10.36.0.121:6789/0,af-staging-ceph02=10.36.0.122:6789/0,af-staging-ceph03=10.36.0.123:6789/0} election epoch 38, quorum 0,2 af-staging-ceph01,af-staging-ceph03 fsmap e29: 1/1/1 up {0=af-staging-ceph03.crm.ig.local=up:active}, 2 up:standby osdmap e78: 3 osds: 2 up, 3 in; 64 remapped pgs flags sortbitwise,require_jewel_osds pgmap v334: 64 pgs, 2 pools, 47129 bytes data, 21 objects 122 MB used, 15204 MB / 15326 MB avail 21/63 objects degraded (33.333%) 64 active+undersized+degraded And if I kill one more node I lose access to mounted file system on client. Normally I would expect replica-factor to be respected and ceph should create the missing copies of degraded pg. I was trying to rebuild the crush map and it looks like this, but this did not help: rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type osd step emit } # end crush map Would very appreciate help, Thank you very much in advance, Oleg. |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com