I start ceph cluster on my machine with development mode,to estimate the time of recoverying after increasing pgp_num.
all of daemon run on one machine.
CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
memory: 377GB
OS:CentOS Linux release 7.6.1810
ceph version:hammer
builded ceph according to http://docs.ceph.com/docs/hammer/dev/quick_guide/,
ceph -s shows:
cluster 15ec2f3f-86e5-46bc-bf98-4b35841ee6a5
health HEALTH_WARN
pool rbd pg_num 512 > pgp_num 256
monmap e1: 1 mons at {a=172.30.250.25:6789/0}
election epoch 2, quorum 0 a
osdmap e88: 30 osds: 30 up, 30 in
pgmap v829: 512 pgs, 1 pools, 57812 MB data, 14454 objects
5691 GB used, 27791 GB / 33483 GB avail
512 active+clean
and ceph osd tree[3]
It start to recovering after i increased pgp_num. ceph -w says there are some osd down, but the process is runing.All configuration items of osd or mon are default[1]
some messages that ceph -w[2] says,as below :
2019-06-26 15:03:21.839750 mon.0 [INF] pgmap v842: 512 pgs: 127 active+degraded, 84 activating+degraded, 256 active+clean, 45 active+recovering+degraded; 57812 MB data, 5714 GB used, 27769 GB / 33483 GB avail; 22200/43362 objects degraded (51.197%); 50789 kB/s, 12 objects/s recovering
2019-06-26 15:03:21.840884 mon.0 [INF] osd.1 172.30.250.25:6804/22500 failed (3 reports from 3 peers after 24.867116 >= grace 20.000000)
2019-06-26 15:03:21.841459 mon.0 [INF] osd.9 172.30.250.25:6836/25078 failed (3 reports from 3 peers after 24.867645 >= grace 20.000000)
2019-06-26 15:03:21.841709 mon.0 [INF] osd.0 172.30.250.25:6800/22260 failed (3 reports from 3 peers after 24.846423 >= grace 20.000000)
2019-06-26 15:03:21.842286 mon.0 [INF] osd.13 172.30.250.25:6852/26651 failed (3 reports from 3 peers after 24.846896 >= grace 20.000000)
2019-06-26 15:03:21.842607 mon.0 [INF] osd.5 172.30.250.25:6820/23661 failed (3 reports from 3 peers after 24.804869 >= grace 20.000000)
2019-06-26 15:03:21.842938 mon.0 [INF] osd.10 172.30.250.25:6840/25490 failed (3 reports from 3 peers after 24.805155 >= grace 20.000000)
2019-06-26 15:03:21.843134 mon.0 [INF] osd.12 172.30.250.25:6848/26277 failed (3 reports from 3 peers after 24.805329 >= grace 20.000000)
2019-06-26 15:03:21.843591 mon.0 [INF] osd.8 172.30.250.25:6832/24722 failed (3 reports from 3 peers after 24.805843 >= grace 20.000000)
2019-06-26 15:03:21.849664 mon.0 [INF] osd.21 172.30.250.25:6884/29762 failed (3 reports from 3 peers after 23.497080 >= grace 20.000000)
2019-06-26 15:03:21.862729 mon.0 [INF] osd.14 172.30.250.25:6856/27025 failed (3 reports from 3 peers after 23.510172 >= grace 20.000000)
2019-06-26 15:03:21.864222 mon.0 [INF] osdmap e91: 30 osds: 29 up, 30 in
2019-06-26 15:03:20.336758 osd.11 [WRN] map e91 wrongly marked me down
2019-06-26 15:03:23.408659 mon.0 [INF] pgmap v843: 512 pgs: 8 stale+activating+degraded, 8 stale+active+clean, 161 active+degraded, 2 stale+active+recovering+degraded, 33 activating+degraded, 248 active+clean, 45 active+recovering+degraded, 7 stale+active+degraded; 57812 MB data, 5730 GB used, 27752 GB / 33483 GB avail; 27317/43362 objects degraded (62.998%); 61309 kB/s, 14 objects/s recovering
2019-06-26 15:03:27.538229 mon.0 [INF] osd.18 172.30.250.25:6872/28632 failed (3 reports from 3 peers after 23.180489 >= grace 20.000000)
2019-06-26 15:03:27.539416 mon.0 [INF] osd.7 172.30.250.25:6828/24366 failed (3 reports from 3 peers after 21.900054 >= grace 20.000000)
2019-06-26 15:03:27.541831 mon.0 [INF] osdmap e92: 30 osds: 19 up, 30 in
2019-06-26 15:03:32.748179 mon.0 [INF] osdmap e93: 30 osds: 17 up, 30 in
2019-06-26 15:03:33.678682 mon.0 [INF] pgmap v845: 512 pgs: 17 stale+activating+degraded, 95 stale+active+clean, 55 active+degraded, 13 peering, 18 stale+active+recovering+degraded, 20 activating+degraded, 155 active+clean, 22 active+recovery_wait+degraded, 48 active+recovering+degraded, 69 stale+active+degraded; 57812 MB data, 5734 GB used, 27748 GB / 33483 GB avail; 26979/43362 objects degraded (62.218%); 11510 kB/s, 2 objects/s recovering
2019-06-26 15:03:33.775701 osd.1 [WRN] map e92 wrongly marked me down
Has anyone got any thoughts on what might have happened, or tips on how to dig further into this?
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com