Hey guys! I've got a cluster with 90 OSDs spread across 5 hosts, most of which are hdd based. After some real-world testing, performance was not up to expectations, and as I started researching, I realized that I _should_ have used my locally attached NMVEs as bluestore db devices. So, I decided to "out" all the OSDs on one node, wait for recovery, and then delete and recreate these OSDs using a separate metadata device. The recovery process was relatively straightforward (>300Mbps) until the end, at which it dropped to <1Mbps. Interestingly, the amount of of misplaced objects is gradually *growing*.. Here's what "ceph -s" shows me: --- cluster: id: 4f4d6b12-7036-42d2-9366-8c99e4897391 health: HEALTH_WARN insufficient standby MDS daemons available noout flag(s) set 131 pgs not deep-scrubbed in time 87 pgs not scrubbed in time 3 daemons have recently crashed services: mon: 3 daemons, quorum b,d,e (age 20h) mgr: a(active, since 20h) mds: 4/4 daemons up, 2 hot standby osd: 77 osds: 77 up (since 8h), 56 in (since 5d); 33 remapped pgs flags noout rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 2/2 healthy pools: 15 pools, 401 pgs objects: 43.43M objects, 91 TiB usage: 122 TiB used, 536 TiB / 659 TiB avail pgs: 942074/154910213 objects misplaced (0.608%) 359 active+clean 32 active+clean+remapped 9 active+clean+scrubbing+deep 1 active+remapped+backfilling io: client: 120 MiB/s rd, 17 MiB/s wr, 151 op/s rd, 319 op/s wr recovery: 1.7 MiB/s, 0 objects/s progress: Global Recovery Event (0s) [............................] --- And here's "ceph osd tree" (I outed all the SSD OSDs on some of my hyperconverged hosts, and all disks on stg05): --- ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 840.67651 root default -3 0.93149 host node01 0 ssd 0.93149 osd.0 up 0 1.00000 -11 0.93149 host node03 4 ssd 0.93149 osd.4 up 0 1.00000 -5 0.93149 host node04 1 ssd 0.93149 osd.1 up 0 1.00000 -7 0.93149 host node05 2 ssd 0.93149 osd.2 up 0 1.00000 -9 0.93149 host node06 3 ssd 0.93149 osd.3 up 0 1.00000 -13 0.93149 host node07 5 ssd 0.93149 osd.5 up 0 1.00000 -15 0.93149 host node08 6 ssd 0.93149 osd.6 up 0 1.00000 -25 131.90070 host stg01 7 hdd 10.91409 osd.7 up 1.00000 1.00000 13 hdd 10.91409 osd.13 up 1.00000 1.00000 14 hdd 10.91409 osd.14 up 1.00000 1.00000 19 hdd 10.91409 osd.19 up 1.00000 1.00000 23 hdd 10.91409 osd.23 up 1.00000 1.00000 25 hdd 10.91409 osd.25 up 1.00000 1.00000 30 hdd 10.91409 osd.30 up 1.00000 1.00000 36 hdd 10.91409 osd.36 up 1.00000 1.00000 39 hdd 10.91409 osd.39 up 1.00000 1.00000 43 hdd 10.91409 osd.43 up 1.00000 1.00000 48 hdd 10.91409 osd.48 up 1.00000 1.00000 50 hdd 10.91409 osd.50 up 1.00000 1.00000 34 ssd 0.46579 osd.34 up 1.00000 1.00000 55 ssd 0.46579 osd.55 up 1.00000 1.00000 -31 175.56384 host stg02 12 hdd 14.55269 osd.12 up 1.00000 1.00000 18 hdd 14.55269 osd.18 up 1.00000 1.00000 24 hdd 14.55269 osd.24 up 1.00000 1.00000 29 hdd 14.55269 osd.29 up 1.00000 1.00000 35 hdd 14.55269 osd.35 up 1.00000 1.00000 41 hdd 14.55269 osd.41 up 1.00000 1.00000 46 hdd 14.55269 osd.46 up 1.00000 1.00000 52 hdd 14.55269 osd.52 up 1.00000 1.00000 60 hdd 14.55269 osd.60 up 1.00000 1.00000 64 hdd 14.55269 osd.64 up 1.00000 1.00000 68 hdd 14.55269 osd.68 up 1.00000 1.00000 72 hdd 14.55269 osd.72 up 1.00000 1.00000 8 ssd 0.46579 osd.8 up 1.00000 1.00000 58 ssd 0.46579 osd.58 up 1.00000 1.00000 -37 175.56384 host stg03 11 hdd 14.55269 osd.11 up 1.00000 1.00000 17 hdd 14.55269 osd.17 up 1.00000 1.00000 21 hdd 14.55269 osd.21 up 1.00000 1.00000 28 hdd 14.55269 osd.28 up 1.00000 1.00000 32 hdd 14.55269 osd.32 up 1.00000 1.00000 40 hdd 14.55269 osd.40 up 1.00000 1.00000 45 hdd 14.55269 osd.45 up 1.00000 1.00000 51 hdd 14.55269 osd.51 up 1.00000 1.00000 56 hdd 14.55269 osd.56 up 1.00000 1.00000 61 hdd 14.55269 osd.61 up 1.00000 1.00000 65 hdd 14.55269 osd.65 up 1.00000 1.00000 69 hdd 14.55269 osd.69 up 1.00000 1.00000 74 ssd 0.46579 osd.74 up 1.00000 1.00000 76 ssd 0.46579 osd.76 up 1.00000 1.00000 -34 175.56384 host stg04 10 hdd 14.55269 osd.10 up 1.00000 1.00000 16 hdd 14.55269 osd.16 up 1.00000 1.00000 22 hdd 14.55269 osd.22 up 1.00000 1.00000 27 hdd 14.55269 osd.27 up 1.00000 1.00000 37 hdd 14.55269 osd.37 up 1.00000 1.00000 42 hdd 14.55269 osd.42 up 1.00000 1.00000 47 hdd 14.55269 osd.47 up 1.00000 1.00000 54 hdd 14.55269 osd.54 up 1.00000 1.00000 59 hdd 14.55269 osd.59 up 1.00000 1.00000 63 hdd 14.55269 osd.63 up 1.00000 1.00000 67 hdd 14.55269 osd.67 up 1.00000 1.00000 71 hdd 14.55269 osd.71 up 1.00000 1.00000 33 ssd 0.46579 osd.33 up 1.00000 1.00000 75 ssd 0.46579 osd.75 up 1.00000 1.00000 -28 175.56384 host stg05 9 hdd 14.55269 osd.9 up 0 1.00000 15 hdd 14.55269 osd.15 up 0 1.00000 20 hdd 14.55269 osd.20 up 0 1.00000 26 hdd 14.55269 osd.26 up 0 1.00000 31 hdd 14.55269 osd.31 up 0 1.00000 38 hdd 14.55269 osd.38 up 0 1.00000 44 hdd 14.55269 osd.44 up 0 1.00000 53 hdd 14.55269 osd.53 up 0 1.00000 57 hdd 14.55269 osd.57 up 0 1.00000 62 hdd 14.55269 osd.62 up 0 1.00000 66 hdd 14.55269 osd.66 up 0 1.00000 73 hdd 14.55269 osd.73 up 0 1.00000 49 ssd 0.46579 osd.49 up 0 1.00000 70 ssd 0.46579 osd.70 up 0 1.00000 --- How can I speed up / fix the recovery of this final PG? Thanks! :) D _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx