I see 5 OSDs with 0 CRUSH weight, is that intentional? Notably: > All the problem pg's are on osd.39. osd.39 has 0 CRUSH weight, so CRUSH shouldn’t be placing any PGs there. Yet there appear to be PGs mapped to the 4x 0 weight OSDs that are up. I had hoped that the health detail would show the PG IDs, and thus the pool(s) they are in. 7 pools is a lot for such a small cluster. In case nobody has called it out yet, with size=2 you run a risk of data unavailability and eventual loss. You replied to Eugen asserting that rule 0 is the rule for the `ssd` class. Is it possible that And later you showed a query for PG 28.42 — you must have a LOT of pools, or have created and removed a lot in the past. So at least some and perhaps all of the problem PGs are in pool #28. Please send `ceph osd dump | grep pool` ; I don’t see that in the thread so far. Similarly `ceph osd tree` which I don’t think we’ve seen yet. Your cluster is … more complicated than I had expected. I see that a number of your OSDs have the class ‘old-ssd’, I’ll guess those are the smaller ones? Mind you it’s Sunday morning for me (yawwwn) but I see 4x OSDs with the device class ’ssd’ that appear to be directly under the CRUSH root, vs being under a host bucket. That could be part of the problem as well. 25 ssd 0.28319 1.00000 290 GiB 152 GiB 151 GiB 2.8 MiB 1020 MiB 138 GiB 52.52 1.06 39 up 41 ssd 3.49309 1.00000 3.5 TiB 1.6 TiB 1.6 TiB 22 MiB 5.4 GiB 1.9 TiB 44.58 0.90 411 up 38 ssd 0 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 41 up 0 ssd 0 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down 36 ssd 0 1.00000 290 GiB 1.3 GiB 23 MiB 2.8 MiB 1.2 GiB 289 GiB 0.44 0.01 13 up 37 ssd 0 1.00000 290 GiB 1.1 GiB 23 MiB 2.5 MiB 1.1 GiB 289 GiB 0.38 0.01 18 up 39 ssd 0 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 86 up The SSD OSDs are really small, but you know that - just be aware that overhead is such that you won’t be able to fill them as much as you might think. This is one of the wrinkles with having OSDs of considerably different sizes used for a common pool. You didn’t ask about performance, but the 3.84TB SSDs are getting 12x the op workload of the 300GB SSDs. Which may not be all bad, since a 300GB SSD may well be old, client-class, and/or have low performance/endurance. >> Disabling mclock as described here https://docs.ceph.com/en/reef/rados/configuration/mclock-config-ref/ might help > I cannot see any option that allows me to disable mclock... > The option to enable override is what I meant, also cf later replies to this thread. "choose_total_tries": 50, Upping this to say 100 may help. I’ve seen placement snags on very small clusters. >> >> Also, you have a small cluster with a bunch of small OSDs. Please send `ceph health detail` > # ceph health detail > HEALTH_WARN 1 pool(s) have no replicas configured > [WRN] POOL_NO_REDUNDANCY: 1 pool(s) have no replicas configured > pool 'backups' has no replicas configured > > The backups pool is on spinners and 1 copy is sufficient since we're also replicating this offsite in real time. Otherwise too many copies take too much space. > >> >> Please also send `ceph osd df` and `ceph osd crush dump` > NodeC:~# ceph osd df > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS > 2 hdd 1.86029 1.00000 1.9 TiB 1.1 TiB 1.1 TiB 16 MiB 2.7 GiB 746 GiB 60.97 1.23 68 up > 3 hdd 1.86029 1.00000 1.9 TiB 949 GiB 902 GiB 12 MiB 2.6 GiB 961 GiB 49.68 1.00 57 up > 4 hdd 1.86029 1.00000 1.9 TiB 848 GiB 800 GiB 10 MiB 2.3 GiB 1.0 TiB 44.37 0.89 51 up > 5 hdd 1.86589 1.00000 1.9 TiB 1.0 TiB 982 GiB 15 MiB 2.4 GiB 881 GiB 53.87 1.08 66 up > 0 ssd 0 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down > 1 ssd 0.28319 1.00000 290 GiB 123 GiB 122 GiB 2.3 MiB 1.2 GiB 167 GiB 42.38 0.85 46 up > 36 ssd 0 1.00000 290 GiB 1.3 GiB 23 MiB 2.8 MiB 1.2 GiB 289 GiB 0.44 0.01 13 up > 37 ssd 0 1.00000 290 GiB 1.1 GiB 23 MiB 2.5 MiB 1.1 GiB 289 GiB 0.38 0.01 18 up > 38 ssd 0 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 41 up > 39 ssd 0 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 86 up > 40 ssd 3.30690 1.00000 3.3 TiB 1.7 TiB 1.7 TiB 23 MiB 4.8 GiB 1.6 TiB 50.49 1.02 424 up > 10 hdd 1.86029 1.00000 1.9 TiB 1.0 TiB 990 GiB 13 MiB 2.8 GiB 873 GiB 54.19 1.09 64 up > 11 hdd 1.86029 1.00000 1.9 TiB 1.3 TiB 1.2 TiB 15 MiB 3.2 GiB 625 GiB 67.20 1.35 72 up > 26 hdd 1.86029 1.00000 1.9 TiB 843 GiB 801 GiB 12 MiB 2.3 GiB 1.0 TiB 44.25 0.89 50 up > 27 hdd 1.86029 1.00000 1.9 TiB 1.1 TiB 1.0 TiB 16 MiB 2.7 GiB 793 GiB 58.38 1.18 70 up > 6 ssd 0.28319 1.00000 290 GiB 115 GiB 115 GiB 2.3 MiB 743 MiB 175 GiB 39.82 0.80 28 up > 7 ssd 0.28319 1.00000 290 GiB 94 GiB 93 GiB 2.0 MiB 921 MiB 196 GiB 32.51 0.65 23 up > 8 ssd 0.28319 1.00000 290 GiB 103 GiB 102 GiB 2.1 MiB 963 MiB 187 GiB 35.46 0.71 27 up > 9 ssd 0.28319 1.00000 290 GiB 131 GiB 130 GiB 2.5 MiB 1.1 GiB 159 GiB 45.28 0.91 33 up > 24 ssd 0.28319 1.00000 290 GiB 102 GiB 101 GiB 2.1 MiB 719 MiB 188 GiB 35.10 0.71 26 up > 25 ssd 0.28319 1.00000 290 GiB 152 GiB 151 GiB 2.8 MiB 1020 MiB 138 GiB 52.52 1.06 39 up > 41 ssd 3.49309 1.00000 3.5 TiB 1.6 TiB 1.6 TiB 22 MiB 5.4 GiB 1.9 TiB 44.58 0.90 411 up > 14 hdd 1.59999 1.00000 1.9 TiB 1.0 TiB 1.0 TiB 13 MiB 2.4 GiB 837 GiB 56.04 1.13 63 up > 15 hdd 1.86029 1.00000 1.9 TiB 1017 GiB 975 GiB 13 MiB 2.1 GiB 888 GiB 53.36 1.07 60 up > 16 hdd 1.86029 1.00000 1.9 TiB 990 GiB 948 GiB 13 MiB 2.2 GiB 915 GiB 51.97 1.05 59 up > 17 hdd 1.86029 1.00000 1.9 TiB 1.0 TiB 992 GiB 14 MiB 2.5 GiB 871 GiB 54.28 1.09 64 up > 12 ssd 0.28319 1.00000 290 GiB 122 GiB 121 GiB 2.3 MiB 1.1 GiB 168 GiB 41.96 0.84 31 up > 13 ssd 0.28319 1.00000 290 GiB 145 GiB 144 GiB 2.6 MiB 1.2 GiB 145 GiB 49.95 1.01 37 up > 28 ssd 0.28319 1.00000 290 GiB 142 GiB 141 GiB 2.6 MiB 1.1 GiB 148 GiB 48.89 0.98 35 up > 29 ssd 0.28319 1.00000 290 GiB 109 GiB 108 GiB 2.2 MiB 1.3 GiB 181 GiB 37.58 0.76 29 up > 30 ssd 0.28319 1.00000 290 GiB 146 GiB 145 GiB 2.7 MiB 1.3 GiB 144 GiB 50.37 1.01 37 up > 31 ssd 0.28319 1.00000 290 GiB 131 GiB 130 GiB 2.5 MiB 1.3 GiB 159 GiB 45.24 0.91 34 up > 43 ssd 3.30690 1.00000 3.3 TiB 1.4 TiB 1.4 TiB 19 MiB 4.0 GiB 1.9 TiB 42.50 0.86 353 up > 20 hdd 1.86029 1.00000 1.9 TiB 1.2 TiB 1.2 TiB 12 MiB 2.8 GiB 659 GiB 65.38 1.32 67 up > 21 hdd 1.86029 1.00000 1.9 TiB 1.1 TiB 1.0 TiB 15 MiB 2.8 GiB 815 GiB 57.20 1.15 68 up > 22 hdd 1.86029 1.00000 1.9 TiB 878 GiB 836 GiB 13 MiB 2.4 GiB 1.0 TiB 46.10 0.93 54 up > 23 hdd 1.86029 1.00000 1.9 TiB 1018 GiB 977 GiB 14 MiB 2.8 GiB 886 GiB 53.46 1.08 59 up > 18 ssd 0.28319 1.00000 290 GiB 115 GiB 114 GiB 2.3 MiB 1.3 GiB 175 GiB 39.74 0.80 29 up > 19 ssd 0.28319 1.00000 290 GiB 115 GiB 114 GiB 2.3 MiB 961 MiB 175 GiB 39.59 0.80 28 up > 32 ssd 0.28319 1.00000 290 GiB 103 GiB 102 GiB 2.1 MiB 1.0 GiB 187 GiB 35.65 0.72 26 up > 33 ssd 0.28319 1.00000 290 GiB 98 GiB 97 GiB 2.1 MiB 1.4 GiB 192 GiB 33.92 0.68 26 up > 34 ssd 0.28319 1.00000 290 GiB 141 GiB 140 GiB 2.6 MiB 987 MiB 149 GiB 48.49 0.98 35 up > 35 ssd 0.28319 1.00000 290 GiB 116 GiB 114 GiB 2.3 MiB 1.3 GiB 174 GiB 39.92 0.80 31 up > 42 ssd 3.49309 1.00000 3.5 TiB 1.4 TiB 1.4 TiB 20 MiB 4.5 GiB 2.1 TiB 40.45 0.81 372 up > TOTAL 49 TiB 25 TiB 24 TiB 351 MiB 83 GiB 25 TiB 49.67 > MIN/MAX VAR: 0.01/1.35 STDDEV: 13.87 > > NodeC:~# ceph osd crush dump > { > "devices": [ > { > "id": 0, > "name": "osd.0", > "class": "ssd" > }, > { > "id": 1, > "name": "osd.1", > "class": "ssd" > }, > { > "id": 2, > "name": "osd.2", > "class": "hdd" > }, > { > "id": 3, > "name": "osd.3", > "class": "hdd" > }, > { > "id": 4, > "name": "osd.4", > "class": "hdd" > }, > { > "id": 5, > "name": "osd.5", > "class": "hdd" > }, > { > "id": 6, > "name": "osd.6", > "class": "ssd" > }, > { > "id": 7, > "name": "osd.7", > "class": "ssd" > }, > { > "id": 8, > "name": "osd.8", > "class": "ssd" > }, > { > "id": 9, > "name": "osd.9", > "class": "ssd" > }, > { > "id": 10, > "name": "osd.10", > "class": "hdd" > }, > { > "id": 11, > "name": "osd.11", > "class": "hdd" > }, > { > "id": 12, > "name": "osd.12", > "class": "ssd" > }, > { > "id": 13, > "name": "osd.13", > "class": "ssd" > }, > { > "id": 14, > "name": "osd.14", > "class": "hdd" > }, > { > "id": 15, > "name": "osd.15", > "class": "hdd" > }, > { > "id": 16, > "name": "osd.16", > "class": "hdd" > }, > { > "id": 17, > "name": "osd.17", > "class": "hdd" > }, > { > "id": 18, > "name": "osd.18", > "class": "ssd" > }, > { > "id": 19, > "name": "osd.19", > "class": "ssd" > }, > { > "id": 20, > "name": "osd.20", > "class": "hdd" > }, > { > "id": 21, > "name": "osd.21", > "class": "hdd" > }, > { > "id": 22, > "name": "osd.22", > "class": "hdd" > }, > { > "id": 23, > "name": "osd.23", > "class": "hdd" > }, > { > "id": 24, > "name": "osd.24", > "class": "ssd" > }, > { > "id": 25, > "name": "osd.25", > "class": "ssd" > }, > { > "id": 26, > "name": "osd.26", > "class": "hdd" > }, > { > "id": 27, > "name": "osd.27", > "class": "hdd" > }, > { > "id": 28, > "name": "osd.28", > "class": "ssd" > }, > { > "id": 29, > "name": "osd.29", > "class": "ssd" > }, > { > "id": 30, > "name": "osd.30", > "class": "ssd" > }, > { > "id": 31, > "name": "osd.31", > "class": "ssd" > }, > { > "id": 32, > "name": "osd.32", > "class": "ssd" > }, > { > "id": 33, > "name": "osd.33", > "class": "ssd" > }, > { > "id": 34, > "name": "osd.34", > "class": "ssd" > }, > { > "id": 35, > "name": "osd.35", > "class": "ssd" > }, > { > "id": 36, > "name": "osd.36", > "class": "ssd" > }, > { > "id": 37, > "name": "osd.37", > "class": "ssd" > }, > { > "id": 38, > "name": "osd.38", > "class": "ssd" > }, > { > "id": 39, > "name": "osd.39", > "class": "ssd" > }, > { > "id": 40, > "name": "osd.40", > "class": "ssd" > }, > { > "id": 41, > "name": "osd.41", > "class": "ssd" > }, > { > "id": 42, > "name": "osd.42", > "class": "ssd" > }, > { > "id": 43, > "name": "osd.43", > "class": "ssd" > } > ], > "types": [ > { > "type_id": 0, > "name": "osd" > }, > { > "type_id": 1, > "name": "host" > }, > { > "type_id": 2, > "name": "chassis" > }, > { > "type_id": 3, > "name": "rack" > }, > { > "type_id": 4, > "name": "row" > }, > { > "type_id": 5, > "name": "pdu" > }, > { > "type_id": 6, > "name": "pod" > }, > { > "type_id": 7, > "name": "room" > }, > { > "type_id": 8, > "name": "datacenter" > }, > { > "type_id": 9, > "name": "zone" > }, > { > "type_id": 10, > "name": "region" > }, > { > "type_id": 11, > "name": "root" > } > ], > "buckets": [ > { > "id": -1, > "name": "default", > "type_id": 11, > "type_name": "root", > "weight": 3177873, > "alg": "straw2", > "hash": "rjenkins1", > "items": [ > { > "id": -3, > "weight": 723311, > "pos": 0 > }, > { > "id": -7, > "weight": 827941, > "pos": 1 > }, > { > "id": -10, > "weight": 798680, > "pos": 2 > }, > { > "id": -13, > "weight": 827941, > "pos": 3 > } > ] > }, > { > "id": -2, > "name": "default~ssd", > "type_id": 11, > "type_name": "root", > "weight": 1243909, > "alg": "straw2", > "hash": "rjenkins1", > "items": [ > { > "id": -4, > "weight": 235280, > "pos": 0 > }, > { > "id": -8, > "weight": 340277, > "pos": 1 > }, > { > "id": -11, > "weight": 328075, > "pos": 2 > }, > { > "id": -14, > "weight": 340277, > "pos": 3 > } > ] > }, > { > "id": -3, > "name": "FT1-NodeA", > "type_id": 1, > "type_name": "host", > "weight": 723311, > "alg": "straw2", > "hash": "rjenkins1", > "items": [ > { > "id": 1, > "weight": 18559, > "pos": 0 > }, > { > "id": 36, > "weight": 0, > "pos": 1 > }, > { > "id": 37, > "weight": 0, > "pos": 2 > }, > { > "id": 0, > "weight": 0, > "pos": 3 > }, > { > "id": 38, > "weight": 0, > "pos": 4 > }, > { > "id": 39, > "weight": 0, > "pos": 5 > }, > { > "id": 2, > "weight": 121916, > "pos": 6 > }, > { > "id": 3, > "weight": 121916, > "pos": 7 > }, > { > "id": 4, > "weight": 121916, > "pos": 8 > }, > { > "id": 40, > "weight": 216721, > "pos": 9 > }, > { > "id": 5, > "weight": 122283, > "pos": 10 > } > ] > }, > { > "id": -4, > "name": "FT1-NodeA~ssd", > "type_id": 1, > "type_name": "host", > "weight": 235280, > "alg": "straw2", > "hash": "rjenkins1", > "items": [ > { > "id": 1, > "weight": 18559, > "pos": 0 > }, > { > "id": 36, > "weight": 0, > "pos": 1 > }, > { > "id": 37, > "weight": 0, > "pos": 2 > }, > { > "id": 0, > "weight": 0, > "pos": 3 > }, > { > "id": 38, > "weight": 0, > "pos": 4 > }, > { > "id": 39, > "weight": 0, > "pos": 5 > }, > { > "id": 40, > "weight": 216721, > "pos": 6 > } > ] > }, > { > "id": -5, > "name": "FT1-NodeA~hdd", > "type_id": 1, > "type_name": "host", > "weight": 488031, > "alg": "straw2", > "hash": "rjenkins1", > "items": [ > { > "id": 2, > "weight": 121916, > "pos": 0 > }, > { > "id": 3, > "weight": 121916, > "pos": 1 > }, > { > "id": 4, > "weight": 121916, > "pos": 2 > }, > { > "id": 5, > "weight": 122283, > "pos": 3 > } > ] > }, > { > "id": -6, > "name": "default~hdd", > "type_id": 11, > "type_name": "root", > "weight": 1933964, > "alg": "straw2", > "hash": "rjenkins1", > "items": [ > { > "id": -5, > "weight": 488031, > "pos": 0 > }, > { > "id": -9, > "weight": 487664, > "pos": 1 > }, > { > "id": -12, > "weight": 470605, > "pos": 2 > }, > { > "id": -15, > "weight": 487664, > "pos": 3 > } > ] > }, > { > "id": -7, > "name": "FT1-NodeB", > "type_id": 1, > "type_name": "host", > "weight": 827941, > "alg": "straw2", > "hash": "rjenkins1", > "items": [ > { > "id": 6, > "weight": 18559, > "pos": 0 > }, > { > "id": 24, > "weight": 18559, > "pos": 1 > }, > { > "id": 25, > "weight": 18559, > "pos": 2 > }, > { > "id": 11, > "weight": 121916, > "pos": 3 > }, > { > "id": 10, > "weight": 121916, > "pos": 4 > }, > { > "id": 7, > "weight": 18559, > "pos": 5 > }, > { > "id": 8, > "weight": 18559, > "pos": 6 > }, > { > "id": 9, > "weight": 18559, > "pos": 7 > }, > { > "id": 26, > "weight": 121916, > "pos": 8 > }, > { > "id": 27, > "weight": 121916, > "pos": 9 > }, > { > "id": 41, > "weight": 228923, > "pos": 10 > } > ] > }, > { > "id": -8, > "name": "FT1-NodeB~ssd", > "type_id": 1, > "type_name": "host", > "weight": 340277, > "alg": "straw2", > "hash": "rjenkins1", > "items": [ > { > "id": 6, > "weight": 18559, > "pos": 0 > }, > { > "id": 24, > "weight": 18559, > "pos": 1 > }, > { > "id": 25, > "weight": 18559, > "pos": 2 > }, > { > "id": 7, > "weight": 18559, > "pos": 3 > }, > { > "id": 8, > "weight": 18559, > "pos": 4 > }, > { > "id": 9, > "weight": 18559, > "pos": 5 > }, > { > "id": 41, > "weight": 228923, > "pos": 6 > } > ] > }, > { > "id": -9, > "name": "FT1-NodeB~hdd", > "type_id": 1, > "type_name": "host", > "weight": 487664, > "alg": "straw2", > "hash": "rjenkins1", > "items": [ > { > "id": 11, > "weight": 121916, > "pos": 0 > }, > { > "id": 10, > "weight": 121916, > "pos": 1 > }, > { > "id": 26, > "weight": 121916, > "pos": 2 > }, > { > "id": 27, > "weight": 121916, > "pos": 3 > } > ] > }, > { > "id": -10, > "name": "FT1-NodeC", > "type_id": 1, > "type_name": "host", > "weight": 798680, > "alg": "straw2", > "hash": "rjenkins1", > "items": [ > { > "id": 13, > "weight": 18559, > "pos": 0 > }, > { > "id": 28, > "weight": 18559, > "pos": 1 > }, > { > "id": 29, > "weight": 18559, > "pos": 2 > }, > { > "id": 14, > "weight": 104857, > "pos": 3 > }, > { > "id": 12, > "weight": 18559, > "pos": 4 > }, > { > "id": 30, > "weight": 18559, > "pos": 5 > }, > { > "id": 31, > "weight": 18559, > "pos": 6 > }, > { > "id": 15, > "weight": 121916, > "pos": 7 > }, > { > "id": 16, > "weight": 121916, > "pos": 8 > }, > { > "id": 17, > "weight": 121916, > "pos": 9 > }, > { > "id": 43, > "weight": 216721, > "pos": 10 > } > ] > }, > { > "id": -11, > "name": "FT1-NodeC~ssd", > "type_id": 1, > "type_name": "host", > "weight": 328075, > "alg": "straw2", > "hash": "rjenkins1", > "items": [ > { > "id": 13, > "weight": 18559, > "pos": 0 > }, > { > "id": 28, > "weight": 18559, > "pos": 1 > }, > { > "id": 29, > "weight": 18559, > "pos": 2 > }, > { > "id": 12, > "weight": 18559, > "pos": 3 > }, > { > "id": 30, > "weight": 18559, > "pos": 4 > }, > { > "id": 31, > "weight": 18559, > "pos": 5 > }, > { > "id": 43, > "weight": 216721, > "pos": 6 > } > ] > }, > { > "id": -12, > "name": "FT1-NodeC~hdd", > "type_id": 1, > "type_name": "host", > "weight": 470605, > "alg": "straw2", > "hash": "rjenkins1", > "items": [ > { > "id": 14, > "weight": 104857, > "pos": 0 > }, > { > "id": 15, > "weight": 121916, > "pos": 1 > }, > { > "id": 16, > "weight": 121916, > "pos": 2 > }, > { > "id": 17, > "weight": 121916, > "pos": 3 > } > ] > }, > { > "id": -13, > "name": "FT1-NodeD", > "type_id": 1, > "type_name": "host", > "weight": 827941, > "alg": "straw2", > "hash": "rjenkins1", > "items": [ > { > "id": 18, > "weight": 18559, > "pos": 0 > }, > { > "id": 32, > "weight": 18559, > "pos": 1 > }, > { > "id": 33, > "weight": 18559, > "pos": 2 > }, > { > "id": 22, > "weight": 121916, > "pos": 3 > }, > { > "id": 19, > "weight": 18559, > "pos": 4 > }, > { > "id": 34, > "weight": 18559, > "pos": 5 > }, > { > "id": 35, > "weight": 18559, > "pos": 6 > }, > { > "id": 23, > "weight": 121916, > "pos": 7 > }, > { > "id": 21, > "weight": 121916, > "pos": 8 > }, > { > "id": 20, > "weight": 121916, > "pos": 9 > }, > { > "id": 42, > "weight": 228923, > "pos": 10 > } > ] > }, > { > "id": -14, > "name": "FT1-NodeD~ssd", > "type_id": 1, > "type_name": "host", > "weight": 340277, > "alg": "straw2", > "hash": "rjenkins1", > "items": [ > { > "id": 18, > "weight": 18559, > "pos": 0 > }, > { > "id": 32, > "weight": 18559, > "pos": 1 > }, > { > "id": 33, > "weight": 18559, > "pos": 2 > }, > { > "id": 19, > "weight": 18559, > "pos": 3 > }, > { > "id": 34, > "weight": 18559, > "pos": 4 > }, > { > "id": 35, > "weight": 18559, > "pos": 5 > }, > { > "id": 42, > "weight": 228923, > "pos": 6 > } > ] > }, > { > "id": -15, > "name": "FT1-NodeD~hdd", > "type_id": 1, > "type_name": "host", > "weight": 487664, > "alg": "straw2", > "hash": "rjenkins1", > "items": [ > { > "id": 22, > "weight": 121916, > "pos": 0 > }, > { > "id": 23, > "weight": 121916, > "pos": 1 > }, > { > "id": 21, > "weight": 121916, > "pos": 2 > }, > { > "id": 20, > "weight": 121916, > "pos": 3 > } > ] > }, > { > "id": -16, > "name": "FT1-NodeA~old-ssd", > "type_id": 1, > "type_name": "host", > "weight": 0, > "alg": "straw2", > "hash": "rjenkins1", > "items": [] > }, > { > "id": -17, > "name": "FT1-NodeB~old-ssd", > "type_id": 1, > "type_name": "host", > "weight": 0, > "alg": "straw2", > "hash": "rjenkins1", > "items": [] > }, > { > "id": -18, > "name": "FT1-NodeC~old-ssd", > "type_id": 1, > "type_name": "host", > "weight": 0, > "alg": "straw2", > "hash": "rjenkins1", > "items": [] > }, > { > "id": -19, > "name": "FT1-NodeD~old-ssd", > "type_id": 1, > "type_name": "host", > "weight": 0, > "alg": "straw2", > "hash": "rjenkins1", > "items": [] > }, > { > "id": -20, > "name": "default~old-ssd", > "type_id": 11, > "type_name": "root", > "weight": 0, > "alg": "straw2", > "hash": "rjenkins1", > "items": [ > { > "id": -16, > "weight": 0, > "pos": 0 > }, > { > "id": -17, > "weight": 0, > "pos": 1 > }, > { > "id": -18, > "weight": 0, > "pos": 2 > }, > { > "id": -19, > "weight": 0, > "pos": 3 > } > ] > } > ], > "rules": [ > { > "rule_id": 0, > "rule_name": "ssd_rule", > "type": 1, > "steps": [ > { > "op": "take", > "item": -2, > "item_name": "default~ssd" > }, > { > "op": "chooseleaf_firstn", > "num": 0, > "type": "host" > }, > { > "op": "emit" > } > ] > }, > { > "rule_id": 1, > "rule_name": "hdd_rule", > "type": 1, > "steps": [ > { > "op": "take", > "item": -6, > "item_name": "default~hdd" > }, > { > "op": "chooseleaf_firstn", > "num": 0, > "type": "host" > }, > { > "op": "emit" > } > ] > }, > { > "rule_id": 2, > "rule_name": "old_nvme", > "type": 1, > "steps": [ > { > "op": "take", > "item": -20, > "item_name": "default~old-ssd" > }, > { > "op": "chooseleaf_firstn", > "num": 0, > "type": "host" > }, > { > "op": "emit" > } > ] > } > ], > "tunables": { > "choose_local_tries": 0, > "choose_local_fallback_tries": 0, > "choose_total_tries": 50, > "chooseleaf_descend_once": 1, > "chooseleaf_vary_r": 1, > "chooseleaf_stable": 1, > "straw_calc_version": 1, > "allowed_bucket_algs": 54, > "profile": "jewel", > "optimal_tunables": 1, > "legacy_tunables": 0, > "minimum_required_version": "jewel", > "require_feature_tunables": 1, > "require_feature_tunables2": 1, > "has_v2_rules": 0, > "require_feature_tunables3": 1, > "has_v3_rules": 0, > "has_v4_buckets": 1, > "require_feature_tunables5": 1, > "has_v5_rules": 0 > }, > "choose_args": {} > } > > > >> >> It may help to manually edit the CRUSH map as detailed here: >> >> https://docs.ceph.com/en/reef/rados/operations/crush-map-edits/ >> >> >> At the end of the decompiled CRUSH map is a “tunables” section. Look for the “choose_total_tries” >> line, which should have the default value of 50. Change that to 100. >> >> >> >>> On Nov 16, 2024, at 11:13 AM, Roland Giesler <roland@xxxxxxxxxxxxxx> <mailto:roland@xxxxxxxxxxxxxx> wrote: >>> >>> On 2024/11/15 16:37, Anthony D'Atri wrote: >>>> Only 2 OSDs in the acting set. This isn’t a size=2 pool is it? Did you restart osd.39? >>> Restarted both repeatedly. Has no effect. >>> >>> >>> >>>> >>>>> On Nov 15, 2024, at 9:32 AM, Roland Giesler <roland@xxxxxxxxxxxxxx> <mailto:roland@xxxxxxxxxxxxxx> wrote: >>>>> >>>>> "acting": [ >>>>> 39, >>>>> 1 >>>> >> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx