Re: The effect of changing an osd's class

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Glad you’re sorted out.  I had a feeling it was a function of not being able to satisfy pool / rule constraints.

> On Nov 18, 2024, at 1:58 AM, Roland Giesler <roland@xxxxxxxxxxxxxx> wrote:
> 
> On 2024/11/17 18:12, Anthony D'Atri wrote:
>> I see 5 OSDs with 0 CRUSH weight, is that intentional?
> 
> Yes, I  set the weight to 0 to ensure all the pg's are removed from them them since I'm removing them (worn out ssd's)
> 
> I think I found the problem.  I had created a CRUSH rule called old_ssd (and corresponding pool) into which I had earlier attempted to move the ssd's by changing their respective device class to a custom class I created.  Also I had removed the devices from that new class, the rule and pool remained.  Eventually I saw that there was actually ghost data assigned to that pool.  Since I was sure there the were no legitimate pg's in the pool, I deleted it and voila! all the "stuck" pg's vanished.  I have now removed those ssd's from their pool again and so far so good.
> 
>> 
>> Notably:
>> 
>>> All the problem pg's are on osd.39.
>> osd.39 has 0 CRUSH weight, so CRUSH shouldn’t be placing any PGs there.  Yet there appear to be PGs mapped to the 4x 0 weight OSDs that are up.  I had hoped that the health detail would show the PG IDs, and thus the pool(s) they are in.  7 pools is a lot for such a small cluster.  In case nobody has called it out yet, with size=2 you run a risk of data unavailability and eventual loss.
>> 
>> You replied to Eugen asserting that rule 0 is the rule for the `ssd` class.  Is it possible that And later you showed a query for PG 28.42 — you must have a LOT of pools, or have created and removed a lot in the past.  So at least some and perhaps all of the problem PGs are in pool #28.
>> 
>> Please send `ceph osd dump | grep pool` ; I don’t see that in the thread so far.  Similarly `ceph osd tree` which I don’t think we’ve seen yet.  Your cluster is … more complicated than I had expected.
>> 
>> I see that a number of your OSDs have the class ‘old-ssd’, I’ll guess those are the smaller ones?
>> 
>> Mind you it’s Sunday morning for me (yawwwn) but I see 4x OSDs with the device class ’ssd’ that appear to be directly under the CRUSH root, vs being under a host bucket.  That could be part of the problem as well.
>> 
>> 
>> 
>> 
>> 
>> 25    ssd  0.28319   1.00000  290 GiB   152 GiB  151 GiB  2.8 MiB  1020 MiB  138 GiB  52.52  1.06   39      up
>> 41    ssd  3.49309   1.00000  3.5 TiB   1.6 TiB  1.6 TiB   22 MiB   5.4 GiB  1.9 TiB  44.58  0.90  411      up
>> 38    ssd        0         0      0 B       0 B      0 B      0 B       0 B      0 B      0     0   41      up
>>  0    ssd        0         0      0 B       0 B      0 B      0 B       0 B      0 B      0     0    0    down
>> 36    ssd        0   1.00000  290 GiB   1.3 GiB   23 MiB  2.8 MiB   1.2 GiB  289 GiB   0.44  0.01   13      up
>> 37    ssd        0   1.00000  290 GiB   1.1 GiB   23 MiB  2.5 MiB   1.1 GiB  289 GiB   0.38  0.01   18      up
>> 39    ssd        0         0      0 B       0 B      0 B      0 B       0 B      0 B      0     0   86      up
>> 
>> The SSD OSDs are really small, but you know that - just be aware that overhead is such that you won’t be able to fill them as much as you might think.
>>  This is one of the wrinkles with having OSDs of considerably different sizes used for a common pool.  You didn’t ask about performance, but the 3.84TB SSDs are getting 12x the op workload of the 300GB SSDs.  Which may not be all bad, since a 300GB SSD may well be old, client-class, and/or have low performance/endurance.
>> 
>> 
>>>> Disabling mclock as described here https://docs.ceph.com/en/reef/rados/configuration/mclock-config-ref/ might help
>>> I cannot see any option that allows me to disable mclock...
>>> 
>> The option to enable override is what I meant, also cf later replies to this thread.
>> 
>> 
>>         "choose_total_tries": 50,
>> 
>> Upping this to say 100 may help.  I’ve seen placement snags on very small clusters.
>> 
>>>> Also, you have a small cluster with a bunch of small OSDs.  Please send `ceph health detail`
>>> # ceph health detail
>>> HEALTH_WARN 1 pool(s) have no replicas configured
>>> [WRN] POOL_NO_REDUNDANCY: 1 pool(s) have no replicas configured
>>>     pool 'backups' has no replicas configured
>>> 
>>> The backups pool is on spinners and 1 copy is sufficient since we're also replicating this offsite in real time.  Otherwise too many copies take too much space.
>>> 
>>>> Please also send `ceph osd df` and `ceph osd crush dump`
>>> NodeC:~# ceph osd df
>>> ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE   DATA     OMAP     META      AVAIL    %USE   VAR   PGS  STATUS
>>>  2    hdd  1.86029   1.00000  1.9 TiB   1.1 TiB  1.1 TiB   16 MiB   2.7 GiB  746 GiB  60.97  1.23   68      up
>>>  3    hdd  1.86029   1.00000  1.9 TiB   949 GiB  902 GiB   12 MiB   2.6 GiB  961 GiB  49.68  1.00   57      up
>>>  4    hdd  1.86029   1.00000  1.9 TiB   848 GiB  800 GiB   10 MiB   2.3 GiB  1.0 TiB  44.37  0.89   51      up
>>>  5    hdd  1.86589   1.00000  1.9 TiB   1.0 TiB  982 GiB   15 MiB   2.4 GiB  881 GiB  53.87  1.08   66      up
>>>  0    ssd        0         0      0 B       0 B      0 B      0 B       0 B      0 B      0     0    0    down
>>>  1    ssd  0.28319   1.00000  290 GiB   123 GiB  122 GiB  2.3 MiB   1.2 GiB  167 GiB  42.38  0.85   46      up
>>> 36    ssd        0   1.00000  290 GiB   1.3 GiB   23 MiB  2.8 MiB   1.2 GiB  289 GiB   0.44  0.01   13      up
>>> 37    ssd        0   1.00000  290 GiB   1.1 GiB   23 MiB  2.5 MiB   1.1 GiB  289 GiB   0.38  0.01   18      up
>>> 38    ssd        0         0      0 B       0 B      0 B      0 B       0 B      0 B      0     0   41      up
>>> 39    ssd        0         0      0 B       0 B      0 B      0 B       0 B      0 B      0     0   86      up
>>> 40    ssd  3.30690   1.00000  3.3 TiB   1.7 TiB  1.7 TiB   23 MiB   4.8 GiB  1.6 TiB  50.49  1.02  424      up
>>> 10    hdd  1.86029   1.00000  1.9 TiB   1.0 TiB  990 GiB   13 MiB   2.8 GiB  873 GiB  54.19  1.09   64      up
>>> 11    hdd  1.86029   1.00000  1.9 TiB   1.3 TiB  1.2 TiB   15 MiB   3.2 GiB  625 GiB  67.20  1.35   72      up
>>> 26    hdd  1.86029   1.00000  1.9 TiB   843 GiB  801 GiB   12 MiB   2.3 GiB  1.0 TiB  44.25  0.89   50      up
>>> 27    hdd  1.86029   1.00000  1.9 TiB   1.1 TiB  1.0 TiB   16 MiB   2.7 GiB  793 GiB  58.38  1.18   70      up
>>>  6    ssd  0.28319   1.00000  290 GiB   115 GiB  115 GiB  2.3 MiB   743 MiB  175 GiB  39.82  0.80   28      up
>>>  7    ssd  0.28319   1.00000  290 GiB    94 GiB   93 GiB  2.0 MiB   921 MiB  196 GiB  32.51  0.65   23      up
>>>  8    ssd  0.28319   1.00000  290 GiB   103 GiB  102 GiB  2.1 MiB   963 MiB  187 GiB  35.46  0.71   27      up
>>>  9    ssd  0.28319   1.00000  290 GiB   131 GiB  130 GiB  2.5 MiB   1.1 GiB  159 GiB  45.28  0.91   33      up
>>> 24    ssd  0.28319   1.00000  290 GiB   102 GiB  101 GiB  2.1 MiB   719 MiB  188 GiB  35.10  0.71   26      up
>>> 25    ssd  0.28319   1.00000  290 GiB   152 GiB  151 GiB  2.8 MiB  1020 MiB  138 GiB  52.52  1.06   39      up
>>> 41    ssd  3.49309   1.00000  3.5 TiB   1.6 TiB  1.6 TiB   22 MiB   5.4 GiB  1.9 TiB  44.58  0.90  411      up
>>> 14    hdd  1.59999   1.00000  1.9 TiB   1.0 TiB  1.0 TiB   13 MiB   2.4 GiB  837 GiB  56.04  1.13   63      up
>>> 15    hdd  1.86029   1.00000  1.9 TiB  1017 GiB  975 GiB   13 MiB   2.1 GiB  888 GiB  53.36  1.07   60      up
>>> 16    hdd  1.86029   1.00000  1.9 TiB   990 GiB  948 GiB   13 MiB   2.2 GiB  915 GiB  51.97  1.05   59      up
>>> 17    hdd  1.86029   1.00000  1.9 TiB   1.0 TiB  992 GiB   14 MiB   2.5 GiB  871 GiB  54.28  1.09   64      up
>>> 12    ssd  0.28319   1.00000  290 GiB   122 GiB  121 GiB  2.3 MiB   1.1 GiB  168 GiB  41.96  0.84   31      up
>>> 13    ssd  0.28319   1.00000  290 GiB   145 GiB  144 GiB  2.6 MiB   1.2 GiB  145 GiB  49.95  1.01   37      up
>>> 28    ssd  0.28319   1.00000  290 GiB   142 GiB  141 GiB  2.6 MiB   1.1 GiB  148 GiB  48.89  0.98   35      up
>>> 29    ssd  0.28319   1.00000  290 GiB   109 GiB  108 GiB  2.2 MiB   1.3 GiB  181 GiB  37.58  0.76   29      up
>>> 30    ssd  0.28319   1.00000  290 GiB   146 GiB  145 GiB  2.7 MiB   1.3 GiB  144 GiB  50.37  1.01   37      up
>>> 31    ssd  0.28319   1.00000  290 GiB   131 GiB  130 GiB  2.5 MiB   1.3 GiB  159 GiB  45.24  0.91   34      up
>>> 43    ssd  3.30690   1.00000  3.3 TiB   1.4 TiB  1.4 TiB   19 MiB   4.0 GiB  1.9 TiB  42.50  0.86  353      up
>>> 20    hdd  1.86029   1.00000  1.9 TiB   1.2 TiB  1.2 TiB   12 MiB   2.8 GiB  659 GiB  65.38  1.32   67      up
>>> 21    hdd  1.86029   1.00000  1.9 TiB   1.1 TiB  1.0 TiB   15 MiB   2.8 GiB  815 GiB  57.20  1.15   68      up
>>> 22    hdd  1.86029   1.00000  1.9 TiB   878 GiB  836 GiB   13 MiB   2.4 GiB  1.0 TiB  46.10  0.93   54      up
>>> 23    hdd  1.86029   1.00000  1.9 TiB  1018 GiB  977 GiB   14 MiB   2.8 GiB  886 GiB  53.46  1.08   59      up
>>> 18    ssd  0.28319   1.00000  290 GiB   115 GiB  114 GiB  2.3 MiB   1.3 GiB  175 GiB  39.74  0.80   29      up
>>> 19    ssd  0.28319   1.00000  290 GiB   115 GiB  114 GiB  2.3 MiB   961 MiB  175 GiB  39.59  0.80   28      up
>>> 32    ssd  0.28319   1.00000  290 GiB   103 GiB  102 GiB  2.1 MiB   1.0 GiB  187 GiB  35.65  0.72   26      up
>>> 33    ssd  0.28319   1.00000  290 GiB    98 GiB   97 GiB  2.1 MiB   1.4 GiB  192 GiB  33.92  0.68   26      up
>>> 34    ssd  0.28319   1.00000  290 GiB   141 GiB  140 GiB  2.6 MiB   987 MiB  149 GiB  48.49  0.98   35      up
>>> 35    ssd  0.28319   1.00000  290 GiB   116 GiB  114 GiB  2.3 MiB   1.3 GiB  174 GiB  39.92  0.80   31      up
>>> 42    ssd  3.49309   1.00000  3.5 TiB   1.4 TiB  1.4 TiB   20 MiB   4.5 GiB  2.1 TiB  40.45  0.81  372      up
>>>                        TOTAL   49 TiB    25 TiB   24 TiB  351 MiB    83 GiB   25 TiB  49.67
>>> MIN/MAX VAR: 0.01/1.35  STDDEV: 13.87
>>> 
>>> NodeC:~# ceph osd crush dump
>>> {
>>>     "devices": [
>>>         {
>>>             "id": 0,
>>>             "name": "osd.0",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 1,
>>>             "name": "osd.1",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 2,
>>>             "name": "osd.2",
>>>             "class": "hdd"
>>>         },
>>>         {
>>>             "id": 3,
>>>             "name": "osd.3",
>>>             "class": "hdd"
>>>         },
>>>         {
>>>             "id": 4,
>>>             "name": "osd.4",
>>>             "class": "hdd"
>>>         },
>>>         {
>>>             "id": 5,
>>>             "name": "osd.5",
>>>             "class": "hdd"
>>>         },
>>>         {
>>>             "id": 6,
>>>             "name": "osd.6",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 7,
>>>             "name": "osd.7",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 8,
>>>             "name": "osd.8",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 9,
>>>             "name": "osd.9",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 10,
>>>             "name": "osd.10",
>>>             "class": "hdd"
>>>         },
>>>         {
>>>             "id": 11,
>>>             "name": "osd.11",
>>>             "class": "hdd"
>>>         },
>>>         {
>>>             "id": 12,
>>>             "name": "osd.12",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 13,
>>>             "name": "osd.13",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 14,
>>>             "name": "osd.14",
>>>             "class": "hdd"
>>>         },
>>>         {
>>>             "id": 15,
>>>             "name": "osd.15",
>>>             "class": "hdd"
>>>         },
>>>         {
>>>             "id": 16,
>>>             "name": "osd.16",
>>>             "class": "hdd"
>>>         },
>>>         {
>>>             "id": 17,
>>>             "name": "osd.17",
>>>             "class": "hdd"
>>>         },
>>>         {
>>>             "id": 18,
>>>             "name": "osd.18",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 19,
>>>             "name": "osd.19",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 20,
>>>             "name": "osd.20",
>>>             "class": "hdd"
>>>         },
>>>         {
>>>             "id": 21,
>>>             "name": "osd.21",
>>>             "class": "hdd"
>>>         },
>>>         {
>>>             "id": 22,
>>>             "name": "osd.22",
>>>             "class": "hdd"
>>>         },
>>>         {
>>>             "id": 23,
>>>             "name": "osd.23",
>>>             "class": "hdd"
>>>         },
>>>         {
>>>             "id": 24,
>>>             "name": "osd.24",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 25,
>>>             "name": "osd.25",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 26,
>>>             "name": "osd.26",
>>>             "class": "hdd"
>>>         },
>>>         {
>>>             "id": 27,
>>>             "name": "osd.27",
>>>             "class": "hdd"
>>>         },
>>>         {
>>>             "id": 28,
>>>             "name": "osd.28",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 29,
>>>             "name": "osd.29",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 30,
>>>             "name": "osd.30",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 31,
>>>             "name": "osd.31",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 32,
>>>             "name": "osd.32",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 33,
>>>             "name": "osd.33",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 34,
>>>             "name": "osd.34",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 35,
>>>             "name": "osd.35",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 36,
>>>             "name": "osd.36",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 37,
>>>             "name": "osd.37",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 38,
>>>             "name": "osd.38",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 39,
>>>             "name": "osd.39",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 40,
>>>             "name": "osd.40",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 41,
>>>             "name": "osd.41",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 42,
>>>             "name": "osd.42",
>>>             "class": "ssd"
>>>         },
>>>         {
>>>             "id": 43,
>>>             "name": "osd.43",
>>>             "class": "ssd"
>>>         }
>>>     ],
>>>     "types": [
>>>         {
>>>             "type_id": 0,
>>>             "name": "osd"
>>>         },
>>>         {
>>>             "type_id": 1,
>>>             "name": "host"
>>>         },
>>>         {
>>>             "type_id": 2,
>>>             "name": "chassis"
>>>         },
>>>         {
>>>             "type_id": 3,
>>>             "name": "rack"
>>>         },
>>>         {
>>>             "type_id": 4,
>>>             "name": "row"
>>>         },
>>>         {
>>>             "type_id": 5,
>>>             "name": "pdu"
>>>         },
>>>         {
>>>             "type_id": 6,
>>>             "name": "pod"
>>>         },
>>>         {
>>>             "type_id": 7,
>>>             "name": "room"
>>>         },
>>>         {
>>>             "type_id": 8,
>>>             "name": "datacenter"
>>>         },
>>>         {
>>>             "type_id": 9,
>>>             "name": "zone"
>>>         },
>>>         {
>>>             "type_id": 10,
>>>             "name": "region"
>>>         },
>>>         {
>>>             "type_id": 11,
>>>             "name": "root"
>>>         }
>>>     ],
>>>     "buckets": [
>>>         {
>>>             "id": -1,
>>>             "name": "default",
>>>             "type_id": 11,
>>>             "type_name": "root",
>>>             "weight": 3177873,
>>>             "alg": "straw2",
>>>             "hash": "rjenkins1",
>>>             "items": [
>>>                 {
>>>                     "id": -3,
>>>                     "weight": 723311,
>>>                     "pos": 0
>>>                 },
>>>                 {
>>>                     "id": -7,
>>>                     "weight": 827941,
>>>                     "pos": 1
>>>                 },
>>>                 {
>>>                     "id": -10,
>>>                     "weight": 798680,
>>>                     "pos": 2
>>>                 },
>>>                 {
>>>                     "id": -13,
>>>                     "weight": 827941,
>>>                     "pos": 3
>>>                 }
>>>             ]
>>>         },
>>>         {
>>>             "id": -2,
>>>             "name": "default~ssd",
>>>             "type_id": 11,
>>>             "type_name": "root",
>>>             "weight": 1243909,
>>>             "alg": "straw2",
>>>             "hash": "rjenkins1",
>>>             "items": [
>>>                 {
>>>                     "id": -4,
>>>                     "weight": 235280,
>>>                     "pos": 0
>>>                 },
>>>                 {
>>>                     "id": -8,
>>>                     "weight": 340277,
>>>                     "pos": 1
>>>                 },
>>>                 {
>>>                     "id": -11,
>>>                     "weight": 328075,
>>>                     "pos": 2
>>>                 },
>>>                 {
>>>                     "id": -14,
>>>                     "weight": 340277,
>>>                     "pos": 3
>>>                 }
>>>             ]
>>>         },
>>>         {
>>>             "id": -3,
>>>             "name": "FT1-NodeA",
>>>             "type_id": 1,
>>>             "type_name": "host",
>>>             "weight": 723311,
>>>             "alg": "straw2",
>>>             "hash": "rjenkins1",
>>>             "items": [
>>>                 {
>>>                     "id": 1,
>>>                     "weight": 18559,
>>>                     "pos": 0
>>>                 },
>>>                 {
>>>                     "id": 36,
>>>                     "weight": 0,
>>>                     "pos": 1
>>>                 },
>>>                 {
>>>                     "id": 37,
>>>                     "weight": 0,
>>>                     "pos": 2
>>>                 },
>>>                 {
>>>                     "id": 0,
>>>                     "weight": 0,
>>>                     "pos": 3
>>>                 },
>>>                 {
>>>                     "id": 38,
>>>                     "weight": 0,
>>>                     "pos": 4
>>>                 },
>>>                 {
>>>                     "id": 39,
>>>                     "weight": 0,
>>>                     "pos": 5
>>>                 },
>>>                 {
>>>                     "id": 2,
>>>                     "weight": 121916,
>>>                     "pos": 6
>>>                 },
>>>                 {
>>>                     "id": 3,
>>>                     "weight": 121916,
>>>                     "pos": 7
>>>                 },
>>>                 {
>>>                     "id": 4,
>>>                     "weight": 121916,
>>>                     "pos": 8
>>>                 },
>>>                 {
>>>                     "id": 40,
>>>                     "weight": 216721,
>>>                     "pos": 9
>>>                 },
>>>                 {
>>>                     "id": 5,
>>>                     "weight": 122283,
>>>                     "pos": 10
>>>                 }
>>>             ]
>>>         },
>>>         {
>>>             "id": -4,
>>>             "name": "FT1-NodeA~ssd",
>>>             "type_id": 1,
>>>             "type_name": "host",
>>>             "weight": 235280,
>>>             "alg": "straw2",
>>>             "hash": "rjenkins1",
>>>             "items": [
>>>                 {
>>>                     "id": 1,
>>>                     "weight": 18559,
>>>                     "pos": 0
>>>                 },
>>>                 {
>>>                     "id": 36,
>>>                     "weight": 0,
>>>                     "pos": 1
>>>                 },
>>>                 {
>>>                     "id": 37,
>>>                     "weight": 0,
>>>                     "pos": 2
>>>                 },
>>>                 {
>>>                     "id": 0,
>>>                     "weight": 0,
>>>                     "pos": 3
>>>                 },
>>>                 {
>>>                     "id": 38,
>>>                     "weight": 0,
>>>                     "pos": 4
>>>                 },
>>>                 {
>>>                     "id": 39,
>>>                     "weight": 0,
>>>                     "pos": 5
>>>                 },
>>>                 {
>>>                     "id": 40,
>>>                     "weight": 216721,
>>>                     "pos": 6
>>>                 }
>>>             ]
>>>         },
>>>         {
>>>             "id": -5,
>>>             "name": "FT1-NodeA~hdd",
>>>             "type_id": 1,
>>>             "type_name": "host",
>>>             "weight": 488031,
>>>             "alg": "straw2",
>>>             "hash": "rjenkins1",
>>>             "items": [
>>>                 {
>>>                     "id": 2,
>>>                     "weight": 121916,
>>>                     "pos": 0
>>>                 },
>>>                 {
>>>                     "id": 3,
>>>                     "weight": 121916,
>>>                     "pos": 1
>>>                 },
>>>                 {
>>>                     "id": 4,
>>>                     "weight": 121916,
>>>                     "pos": 2
>>>                 },
>>>                 {
>>>                     "id": 5,
>>>                     "weight": 122283,
>>>                     "pos": 3
>>>                 }
>>>             ]
>>>         },
>>>         {
>>>             "id": -6,
>>>             "name": "default~hdd",
>>>             "type_id": 11,
>>>             "type_name": "root",
>>>             "weight": 1933964,
>>>             "alg": "straw2",
>>>             "hash": "rjenkins1",
>>>             "items": [
>>>                 {
>>>                     "id": -5,
>>>                     "weight": 488031,
>>>                     "pos": 0
>>>                 },
>>>                 {
>>>                     "id": -9,
>>>                     "weight": 487664,
>>>                     "pos": 1
>>>                 },
>>>                 {
>>>                     "id": -12,
>>>                     "weight": 470605,
>>>                     "pos": 2
>>>                 },
>>>                 {
>>>                     "id": -15,
>>>                     "weight": 487664,
>>>                     "pos": 3
>>>                 }
>>>             ]
>>>         },
>>>         {
>>>             "id": -7,
>>>             "name": "FT1-NodeB",
>>>             "type_id": 1,
>>>             "type_name": "host",
>>>             "weight": 827941,
>>>             "alg": "straw2",
>>>             "hash": "rjenkins1",
>>>             "items": [
>>>                 {
>>>                     "id": 6,
>>>                     "weight": 18559,
>>>                     "pos": 0
>>>                 },
>>>                 {
>>>                     "id": 24,
>>>                     "weight": 18559,
>>>                     "pos": 1
>>>                 },
>>>                 {
>>>                     "id": 25,
>>>                     "weight": 18559,
>>>                     "pos": 2
>>>                 },
>>>                 {
>>>                     "id": 11,
>>>                     "weight": 121916,
>>>                     "pos": 3
>>>                 },
>>>                 {
>>>                     "id": 10,
>>>                     "weight": 121916,
>>>                     "pos": 4
>>>                 },
>>>                 {
>>>                     "id": 7,
>>>                     "weight": 18559,
>>>                     "pos": 5
>>>                 },
>>>                 {
>>>                     "id": 8,
>>>                     "weight": 18559,
>>>                     "pos": 6
>>>                 },
>>>                 {
>>>                     "id": 9,
>>>                     "weight": 18559,
>>>                     "pos": 7
>>>                 },
>>>                 {
>>>                     "id": 26,
>>>                     "weight": 121916,
>>>                     "pos": 8
>>>                 },
>>>                 {
>>>                     "id": 27,
>>>                     "weight": 121916,
>>>                     "pos": 9
>>>                 },
>>>                 {
>>>                     "id": 41,
>>>                     "weight": 228923,
>>>                     "pos": 10
>>>                 }
>>>             ]
>>>         },
>>>         {
>>>             "id": -8,
>>>             "name": "FT1-NodeB~ssd",
>>>             "type_id": 1,
>>>             "type_name": "host",
>>>             "weight": 340277,
>>>             "alg": "straw2",
>>>             "hash": "rjenkins1",
>>>             "items": [
>>>                 {
>>>                     "id": 6,
>>>                     "weight": 18559,
>>>                     "pos": 0
>>>                 },
>>>                 {
>>>                     "id": 24,
>>>                     "weight": 18559,
>>>                     "pos": 1
>>>                 },
>>>                 {
>>>                     "id": 25,
>>>                     "weight": 18559,
>>>                     "pos": 2
>>>                 },
>>>                 {
>>>                     "id": 7,
>>>                     "weight": 18559,
>>>                     "pos": 3
>>>                 },
>>>                 {
>>>                     "id": 8,
>>>                     "weight": 18559,
>>>                     "pos": 4
>>>                 },
>>>                 {
>>>                     "id": 9,
>>>                     "weight": 18559,
>>>                     "pos": 5
>>>                 },
>>>                 {
>>>                     "id": 41,
>>>                     "weight": 228923,
>>>                     "pos": 6
>>>                 }
>>>             ]
>>>         },
>>>         {
>>>             "id": -9,
>>>             "name": "FT1-NodeB~hdd",
>>>             "type_id": 1,
>>>             "type_name": "host",
>>>             "weight": 487664,
>>>             "alg": "straw2",
>>>             "hash": "rjenkins1",
>>>             "items": [
>>>                 {
>>>                     "id": 11,
>>>                     "weight": 121916,
>>>                     "pos": 0
>>>                 },
>>>                 {
>>>                     "id": 10,
>>>                     "weight": 121916,
>>>                     "pos": 1
>>>                 },
>>>                 {
>>>                     "id": 26,
>>>                     "weight": 121916,
>>>                     "pos": 2
>>>                 },
>>>                 {
>>>                     "id": 27,
>>>                     "weight": 121916,
>>>                     "pos": 3
>>>                 }
>>>             ]
>>>         },
>>>         {
>>>             "id": -10,
>>>             "name": "FT1-NodeC",
>>>             "type_id": 1,
>>>             "type_name": "host",
>>>             "weight": 798680,
>>>             "alg": "straw2",
>>>             "hash": "rjenkins1",
>>>             "items": [
>>>                 {
>>>                     "id": 13,
>>>                     "weight": 18559,
>>>                     "pos": 0
>>>                 },
>>>                 {
>>>                     "id": 28,
>>>                     "weight": 18559,
>>>                     "pos": 1
>>>                 },
>>>                 {
>>>                     "id": 29,
>>>                     "weight": 18559,
>>>                     "pos": 2
>>>                 },
>>>                 {
>>>                     "id": 14,
>>>                     "weight": 104857,
>>>                     "pos": 3
>>>                 },
>>>                 {
>>>                     "id": 12,
>>>                     "weight": 18559,
>>>                     "pos": 4
>>>                 },
>>>                 {
>>>                     "id": 30,
>>>                     "weight": 18559,
>>>                     "pos": 5
>>>                 },
>>>                 {
>>>                     "id": 31,
>>>                     "weight": 18559,
>>>                     "pos": 6
>>>                 },
>>>                 {
>>>                     "id": 15,
>>>                     "weight": 121916,
>>>                     "pos": 7
>>>                 },
>>>                 {
>>>                     "id": 16,
>>>                     "weight": 121916,
>>>                     "pos": 8
>>>                 },
>>>                 {
>>>                     "id": 17,
>>>                     "weight": 121916,
>>>                     "pos": 9
>>>                 },
>>>                 {
>>>                     "id": 43,
>>>                     "weight": 216721,
>>>                     "pos": 10
>>>                 }
>>>             ]
>>>         },
>>>         {
>>>             "id": -11,
>>>             "name": "FT1-NodeC~ssd",
>>>             "type_id": 1,
>>>             "type_name": "host",
>>>             "weight": 328075,
>>>             "alg": "straw2",
>>>             "hash": "rjenkins1",
>>>             "items": [
>>>                 {
>>>                     "id": 13,
>>>                     "weight": 18559,
>>>                     "pos": 0
>>>                 },
>>>                 {
>>>                     "id": 28,
>>>                     "weight": 18559,
>>>                     "pos": 1
>>>                 },
>>>                 {
>>>                     "id": 29,
>>>                     "weight": 18559,
>>>                     "pos": 2
>>>                 },
>>>                 {
>>>                     "id": 12,
>>>                     "weight": 18559,
>>>                     "pos": 3
>>>                 },
>>>                 {
>>>                     "id": 30,
>>>                     "weight": 18559,
>>>                     "pos": 4
>>>                 },
>>>                 {
>>>                     "id": 31,
>>>                     "weight": 18559,
>>>                     "pos": 5
>>>                 },
>>>                 {
>>>                     "id": 43,
>>>                     "weight": 216721,
>>>                     "pos": 6
>>>                 }
>>>             ]
>>>         },
>>>         {
>>>             "id": -12,
>>>             "name": "FT1-NodeC~hdd",
>>>             "type_id": 1,
>>>             "type_name": "host",
>>>             "weight": 470605,
>>>             "alg": "straw2",
>>>             "hash": "rjenkins1",
>>>             "items": [
>>>                 {
>>>                     "id": 14,
>>>                     "weight": 104857,
>>>                     "pos": 0
>>>                 },
>>>                 {
>>>                     "id": 15,
>>>                     "weight": 121916,
>>>                     "pos": 1
>>>                 },
>>>                 {
>>>                     "id": 16,
>>>                     "weight": 121916,
>>>                     "pos": 2
>>>                 },
>>>                 {
>>>                     "id": 17,
>>>                     "weight": 121916,
>>>                     "pos": 3
>>>                 }
>>>             ]
>>>         },
>>>         {
>>>             "id": -13,
>>>             "name": "FT1-NodeD",
>>>             "type_id": 1,
>>>             "type_name": "host",
>>>             "weight": 827941,
>>>             "alg": "straw2",
>>>             "hash": "rjenkins1",
>>>             "items": [
>>>                 {
>>>                     "id": 18,
>>>                     "weight": 18559,
>>>                     "pos": 0
>>>                 },
>>>                 {
>>>                     "id": 32,
>>>                     "weight": 18559,
>>>                     "pos": 1
>>>                 },
>>>                 {
>>>                     "id": 33,
>>>                     "weight": 18559,
>>>                     "pos": 2
>>>                 },
>>>                 {
>>>                     "id": 22,
>>>                     "weight": 121916,
>>>                     "pos": 3
>>>                 },
>>>                 {
>>>                     "id": 19,
>>>                     "weight": 18559,
>>>                     "pos": 4
>>>                 },
>>>                 {
>>>                     "id": 34,
>>>                     "weight": 18559,
>>>                     "pos": 5
>>>                 },
>>>                 {
>>>                     "id": 35,
>>>                     "weight": 18559,
>>>                     "pos": 6
>>>                 },
>>>                 {
>>>                     "id": 23,
>>>                     "weight": 121916,
>>>                     "pos": 7
>>>                 },
>>>                 {
>>>                     "id": 21,
>>>                     "weight": 121916,
>>>                     "pos": 8
>>>                 },
>>>                 {
>>>                     "id": 20,
>>>                     "weight": 121916,
>>>                     "pos": 9
>>>                 },
>>>                 {
>>>                     "id": 42,
>>>                     "weight": 228923,
>>>                     "pos": 10
>>>                 }
>>>             ]
>>>         },
>>>         {
>>>             "id": -14,
>>>             "name": "FT1-NodeD~ssd",
>>>             "type_id": 1,
>>>             "type_name": "host",
>>>             "weight": 340277,
>>>             "alg": "straw2",
>>>             "hash": "rjenkins1",
>>>             "items": [
>>>                 {
>>>                     "id": 18,
>>>                     "weight": 18559,
>>>                     "pos": 0
>>>                 },
>>>                 {
>>>                     "id": 32,
>>>                     "weight": 18559,
>>>                     "pos": 1
>>>                 },
>>>                 {
>>>                     "id": 33,
>>>                     "weight": 18559,
>>>                     "pos": 2
>>>                 },
>>>                 {
>>>                     "id": 19,
>>>                     "weight": 18559,
>>>                     "pos": 3
>>>                 },
>>>                 {
>>>                     "id": 34,
>>>                     "weight": 18559,
>>>                     "pos": 4
>>>                 },
>>>                 {
>>>                     "id": 35,
>>>                     "weight": 18559,
>>>                     "pos": 5
>>>                 },
>>>                 {
>>>                     "id": 42,
>>>                     "weight": 228923,
>>>                     "pos": 6
>>>                 }
>>>             ]
>>>         },
>>>         {
>>>             "id": -15,
>>>             "name": "FT1-NodeD~hdd",
>>>             "type_id": 1,
>>>             "type_name": "host",
>>>             "weight": 487664,
>>>             "alg": "straw2",
>>>             "hash": "rjenkins1",
>>>             "items": [
>>>                 {
>>>                     "id": 22,
>>>                     "weight": 121916,
>>>                     "pos": 0
>>>                 },
>>>                 {
>>>                     "id": 23,
>>>                     "weight": 121916,
>>>                     "pos": 1
>>>                 },
>>>                 {
>>>                     "id": 21,
>>>                     "weight": 121916,
>>>                     "pos": 2
>>>                 },
>>>                 {
>>>                     "id": 20,
>>>                     "weight": 121916,
>>>                     "pos": 3
>>>                 }
>>>             ]
>>>         },
>>>         {
>>>             "id": -16,
>>>             "name": "FT1-NodeA~old-ssd",
>>>             "type_id": 1,
>>>             "type_name": "host",
>>>             "weight": 0,
>>>             "alg": "straw2",
>>>             "hash": "rjenkins1",
>>>             "items": []
>>>         },
>>>         {
>>>             "id": -17,
>>>             "name": "FT1-NodeB~old-ssd",
>>>             "type_id": 1,
>>>             "type_name": "host",
>>>             "weight": 0,
>>>             "alg": "straw2",
>>>             "hash": "rjenkins1",
>>>             "items": []
>>>         },
>>>         {
>>>             "id": -18,
>>>             "name": "FT1-NodeC~old-ssd",
>>>             "type_id": 1,
>>>             "type_name": "host",
>>>             "weight": 0,
>>>             "alg": "straw2",
>>>             "hash": "rjenkins1",
>>>             "items": []
>>>         },
>>>         {
>>>             "id": -19,
>>>             "name": "FT1-NodeD~old-ssd",
>>>             "type_id": 1,
>>>             "type_name": "host",
>>>             "weight": 0,
>>>             "alg": "straw2",
>>>             "hash": "rjenkins1",
>>>             "items": []
>>>         },
>>>         {
>>>             "id": -20,
>>>             "name": "default~old-ssd",
>>>             "type_id": 11,
>>>             "type_name": "root",
>>>             "weight": 0,
>>>             "alg": "straw2",
>>>             "hash": "rjenkins1",
>>>             "items": [
>>>                 {
>>>                     "id": -16,
>>>                     "weight": 0,
>>>                     "pos": 0
>>>                 },
>>>                 {
>>>                     "id": -17,
>>>                     "weight": 0,
>>>                     "pos": 1
>>>                 },
>>>                 {
>>>                     "id": -18,
>>>                     "weight": 0,
>>>                     "pos": 2
>>>                 },
>>>                 {
>>>                     "id": -19,
>>>                     "weight": 0,
>>>                     "pos": 3
>>>                 }
>>>             ]
>>>         }
>>>     ],
>>>     "rules": [
>>>         {
>>>             "rule_id": 0,
>>>             "rule_name": "ssd_rule",
>>>             "type": 1,
>>>             "steps": [
>>>                 {
>>>                     "op": "take",
>>>                     "item": -2,
>>>                     "item_name": "default~ssd"
>>>                 },
>>>                 {
>>>                     "op": "chooseleaf_firstn",
>>>                     "num": 0,
>>>                     "type": "host"
>>>                 },
>>>                 {
>>>                     "op": "emit"
>>>                 }
>>>             ]
>>>         },
>>>         {
>>>             "rule_id": 1,
>>>             "rule_name": "hdd_rule",
>>>             "type": 1,
>>>             "steps": [
>>>                 {
>>>                     "op": "take",
>>>                     "item": -6,
>>>                     "item_name": "default~hdd"
>>>                 },
>>>                 {
>>>                     "op": "chooseleaf_firstn",
>>>                     "num": 0,
>>>                     "type": "host"
>>>                 },
>>>                 {
>>>                     "op": "emit"
>>>                 }
>>>             ]
>>>         },
>>>         {
>>>             "rule_id": 2,
>>>             "rule_name": "old_nvme",
>>>             "type": 1,
>>>             "steps": [
>>>                 {
>>>                     "op": "take",
>>>                     "item": -20,
>>>                     "item_name": "default~old-ssd"
>>>                 },
>>>                 {
>>>                     "op": "chooseleaf_firstn",
>>>                     "num": 0,
>>>                     "type": "host"
>>>                 },
>>>                 {
>>>                     "op": "emit"
>>>                 }
>>>             ]
>>>         }
>>>     ],
>>>     "tunables": {
>>>         "choose_local_tries": 0,
>>>         "choose_local_fallback_tries": 0,
>>>         "choose_total_tries": 50,
>>>         "chooseleaf_descend_once": 1,
>>>         "chooseleaf_vary_r": 1,
>>>         "chooseleaf_stable": 1,
>>>         "straw_calc_version": 1,
>>>         "allowed_bucket_algs": 54,
>>>         "profile": "jewel",
>>>         "optimal_tunables": 1,
>>>         "legacy_tunables": 0,
>>>         "minimum_required_version": "jewel",
>>>         "require_feature_tunables": 1,
>>>         "require_feature_tunables2": 1,
>>>         "has_v2_rules": 0,
>>>         "require_feature_tunables3": 1,
>>>         "has_v3_rules": 0,
>>>         "has_v4_buckets": 1,
>>>         "require_feature_tunables5": 1,
>>>         "has_v5_rules": 0
>>>     },
>>>     "choose_args": {}
>>> }
>>> 
>>> 
>>> 
>>>> It may help to manually edit the CRUSH map as detailed here:
>>>> 
>>>> https://docs.ceph.com/en/reef/rados/operations/crush-map-edits/
>>>> 
>>>> 
>>>> At the end of the decompiled CRUSH map is a “tunables” section.  Look for the “choose_total_tries”
>>>> line, which should have the default value of 50.  Change that to 100.
>>>> 
>>>> 
>>>> 
>>>>> On Nov 16, 2024, at 11:13 AM, Roland Giesler <roland@xxxxxxxxxxxxxx> <mailto:roland@xxxxxxxxxxxxxx> wrote:
>>>>> 
>>>>> On 2024/11/15 16:37, Anthony D'Atri wrote:
>>>>>> Only 2 OSDs in the acting set.  This isn’t a size=2 pool is it?  Did you restart osd.39?
>>>>> Restarted both repeatedly.  Has no effect.
>>>>> 
>>>>> 
>>>>> 
>>>>>>> On Nov 15, 2024, at 9:32 AM, Roland Giesler <roland@xxxxxxxxxxxxxx> <mailto:roland@xxxxxxxxxxxxxx> wrote:
>>>>>>> 
>>>>>>>     "acting": [
>>>>>>>         39,
>>>>>>>         1
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux