Re: 6 pgs not deep-scrubbed in time

Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx> · Thu, 1 Feb 2024 10:51:59 -0500

I would just set noout for the duration of the reboot no other flags really
needed. There is a better option to limit that flag to just the host being
rebooted. which is "set-group noout <host>" where <host> is the servers
name  in CRUSH. Just the global noout will suffice though.

Anyways... your not scrubbed in time warnings arent going away anytime in
the short term until you finish the pg split. In fact, they will get more
numerous until the pg split finishes (did you start that?). If you want to
get rid of the "cosmetic" issue of the warning you can adjust the interval
in which the warning comes up, but I would suggest you leave it since you
are trying to address the root of the situation and want to see the
resolution.

Respectfully,

*Wes Dillingham*
wes@xxxxxxxxxxxxxxxxx
LinkedIn <http://www.linkedin.com/in/wesleydillingham>

On Thu, Feb 1, 2024 at 9:16 AM Michel Niyoyita <micou12@xxxxxxxxx> wrote:

> And as said before still it is in warning state with pgs not deep-scrubed
> in time . Hope this can be ignored and set those two flags "noout and
> nobackfill" then reboot .
>
> Thank you again Sir
>
> On Thu, 1 Feb 2024, 16:11 Michel Niyoyita, <micou12@xxxxxxxxx> wrote:
>
>> Thank you very much Janne.
>>
>> On Thu, 1 Feb 2024, 15:21 Janne Johansson, <icepic.dz@xxxxxxxxx> wrote:
>>
>>> pause and nodown is not a good option to set, that will certainly make
>>> clients stop IO. Pause will stop it immediately, and nodown will stop
>>> IO when the OSD processes stop running on this host.
>>>
>>> When we do service on a host, we set "noout" and "nobackfill", that is
>>> enough for reboots, OS upgrades and simple disk exchanges.
>>> The PGs on this one host will be degraded during the down period, but
>>> IO continues.
>>> Of course this is when the cluster was healthy to begin with (not
>>> counting "not scrubbed in time" warnings, they don't matter in this
>>> case.)
>>>
>>>
>>>
>>> Den tors 1 feb. 2024 kl 12:21 skrev Michel Niyoyita <micou12@xxxxxxxxx>:
>>> >
>>> > Thanks Very much Wesley,
>>> >
>>> > We have decided to restart one host among three osds hosts. before
>>> doing
>>> > that I need the advices of the team . these are flags I want to set
>>> before
>>> > restart.
>>> >
>>> >  'ceph osd set noout'
>>> >  'ceph osd set nobackfill'
>>> >  'ceph osd set norecover'
>>> >  'ceph osd set norebalance'
>>> > 'ceph osd set nodown'
>>> >  'ceph osd set pause'
>>> > 'ceph osd set nodeep-scrub'
>>> > 'ceph osd set noscrub'
>>> >
>>> >
>>> > Would like to ask if this can be enough to set and restart the host
>>> safely
>>> > . the cluster has 3 as replicas.
>>> >
>>> > will the cluster still be accessible while restart the hosts? after
>>> > restarting I will unset the flags.
>>> >
>>> > Kindly advise.
>>> >
>>> > Michel
>>> >
>>> >
>>> > On Tue, 30 Jan 2024, 17:44 Wesley Dillingham, <wes@xxxxxxxxxxxxxxxxx>
>>> wrote:
>>> >
>>> > > actually it seems the issue I had in mind was fixed in 16.2.11 so you
>>> > > should be fine.
>>> > >
>>> > > Respectfully,
>>> > >
>>> > > *Wes Dillingham*
>>> > > wes@xxxxxxxxxxxxxxxxx
>>> > > LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>>> > >
>>> > >
>>> > > On Tue, Jan 30, 2024 at 10:34 AM Wesley Dillingham <
>>> wes@xxxxxxxxxxxxxxxxx>
>>> > > wrote:
>>> > >
>>> > >> You may want to consider upgrading to 16.2.14 before you do the pg
>>> split.
>>> > >>
>>> > >> Respectfully,
>>> > >>
>>> > >> *Wes Dillingham*
>>> > >> wes@xxxxxxxxxxxxxxxxx
>>> > >> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>>> > >>
>>> > >>
>>> > >> On Tue, Jan 30, 2024 at 10:18 AM Michel Niyoyita <micou12@xxxxxxxxx
>>> >
>>> > >> wrote:
>>> > >>
>>> > >>> I tried that on one of my pool (pool id 3) but the number of pgs
>>> not
>>> > >>> deep-scrubbed in time increased also from 55 to 100 but the number
>>> of PGs
>>> > >>> was increased. I set also autoscale to off mode. before continue
>>> to other
>>> > >>> pools would like to ask if so far there is no negative impact.
>>> > >>>
>>> > >>> ceph -s
>>> > >>>   cluster:
>>> > >>>     id:     cb0caedc-eb5b-42d1-a34f-96facfda8c27
>>> > >>>     health: HEALTH_WARN
>>> > >>>             100 pgs not deep-scrubbed in time
>>> > >>>
>>> > >>>   services:
>>> > >>>     mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 11M)
>>> > >>>     mgr: ceph-mon2(active, since 11M), standbys: ceph-mon3,
>>> ceph-mon1
>>> > >>>     osd: 48 osds: 48 up (since 11M), 48 in (since 12M)
>>> > >>>     rgw: 6 daemons active (6 hosts, 1 zones)
>>> > >>>
>>> > >>>   data:
>>> > >>>     pools:   10 pools, 609 pgs
>>> > >>>     objects: 6.03M objects, 23 TiB
>>> > >>>     usage:   151 TiB used, 282 TiB / 433 TiB avail
>>> > >>>     pgs:     603 active+clean
>>> > >>>              4   active+clean+scrubbing+deep
>>> > >>>              2   active+clean+scrubbing
>>> > >>>
>>> > >>>   io:
>>> > >>>     client:   96 MiB/s rd, 573 MiB/s wr, 576 op/s rd, 648 op/s wr
>>> > >>>
>>> > >>> root@ceph-osd3:/var/log# ceph df
>>> > >>> --- RAW STORAGE ---
>>> > >>> CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
>>> > >>> hdd    433 TiB  282 TiB  151 TiB   151 TiB      34.93
>>> > >>> TOTAL  433 TiB  282 TiB  151 TiB   151 TiB      34.93
>>> > >>>
>>> > >>> --- POOLS ---
>>> > >>> POOL                   ID  PGS   STORED  OBJECTS     USED  %USED
>>> MAX
>>> > >>> AVAIL
>>> > >>> device_health_metrics   1    1  1.1 MiB        3  3.2 MiB      0
>>>    72
>>> > >>> TiB
>>> > >>> .rgw.root               2   32  3.7 KiB        8   96 KiB      0
>>>    72
>>> > >>> TiB
>>> > >>> default.rgw.log         3  256  3.6 KiB      204  408 KiB      0
>>>    72
>>> > >>> TiB
>>> > >>> default.rgw.control     4   32      0 B        8      0 B      0
>>>    72
>>> > >>> TiB
>>> > >>> default.rgw.meta        5   32    382 B        2   24 KiB      0
>>>    72
>>> > >>> TiB
>>> > >>> volumes                 6  128   21 TiB    5.74M   62 TiB  22.30
>>>    72
>>> > >>> TiB
>>> > >>> images                  7   32  878 GiB  112.50k  2.6 TiB   1.17
>>>    72
>>> > >>> TiB
>>> > >>> backups                 8   32      0 B        0      0 B      0
>>>    72
>>> > >>> TiB
>>> > >>> vms                     9   32  870 GiB  170.73k  2.5 TiB   1.13
>>>    72
>>> > >>> TiB
>>> > >>> testbench              10   32      0 B        0      0 B      0
>>>    72
>>> > >>> TiB
>>> > >>>
>>> > >>> On Tue, Jan 30, 2024 at 5:05 PM Wesley Dillingham <
>>> wes@xxxxxxxxxxxxxxxxx>
>>> > >>> wrote:
>>> > >>>
>>> > >>>> It will take a couple weeks to a couple months to complete is my
>>> best
>>> > >>>> guess on 10TB spinners at ~40% full. The cluster should be usable
>>> > >>>> throughout the process.
>>> > >>>>
>>> > >>>> Keep in mind, you should disable the pg autoscaler on any pool
>>> which
>>> > >>>> you are manually adjusting the pg_num for. Increasing the pg_num
>>> is called
>>> > >>>> "pg splitting" you can google around for this to see how it will
>>> work etc.
>>> > >>>>
>>> > >>>> There are a few knobs to increase or decrease the aggressiveness
>>> of the
>>> > >>>> pg split, primarily these are osd_max_backfills and
>>> > >>>> target_max_misplaced_ratio.
>>> > >>>>
>>> > >>>> You can monitor the progress of the split by looking at "ceph osd
>>> pool
>>> > >>>> ls detail" for the pool you are splitting, for this pool pgp_num
>>> will
>>> > >>>> slowly increase up until it reaches the pg_num / pg_num_target.
>>> > >>>>
>>> > >>>> IMO this blog post best covers the issue which you are looking to
>>> > >>>> undertake:
>>> > >>>>
>>> https://ceph.io/en/news/blog/2019/new-in-nautilus-pg-merging-and-autotuning/
>>> > >>>>
>>> > >>>> Respectfully,
>>> > >>>>
>>> > >>>> *Wes Dillingham*
>>> > >>>> wes@xxxxxxxxxxxxxxxxx
>>> > >>>> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>>> > >>>>
>>> > >>>>
>>> > >>>> On Tue, Jan 30, 2024 at 9:38 AM Michel Niyoyita <
>>> micou12@xxxxxxxxx>
>>> > >>>> wrote:
>>> > >>>>
>>> > >>>>> Thanks for your advices Wes, below is what ceph osd df tree
>>> shows ,
>>> > >>>>> the increase of pg_num of the production cluster will not affect
>>> the
>>> > >>>>> performance or crush ? how long it can takes to finish?
>>> > >>>>>
>>> > >>>>> ceph osd df tree
>>> > >>>>> ID  CLASS  WEIGHT     REWEIGHT  SIZE     RAW USE  DATA      OMAP
>>> > >>>>>  META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
>>> > >>>>> -1         433.11841         -  433 TiB  151 TiB    67 TiB  364
>>> MiB
>>> > >>>>> 210 GiB  282 TiB  34.86  1.00    -          root default
>>> > >>>>> -3         144.37280         -  144 TiB   50 TiB    22 TiB  121
>>> MiB
>>> > >>>>>  70 GiB   94 TiB  34.86  1.00    -              host ceph-osd1
>>> > >>>>>  2    hdd    9.02330   1.00000  9.0 TiB  2.7 TiB  1021 GiB  5.4
>>> MiB
>>> > >>>>> 3.7 GiB  6.3 TiB  30.40  0.87   19      up          osd.2
>>> > >>>>>  3    hdd    9.02330   1.00000  9.0 TiB  2.7 TiB   931 GiB  4.1
>>> MiB
>>> > >>>>> 3.5 GiB  6.4 TiB  29.43  0.84   29      up          osd.3
>>> > >>>>>  6    hdd    9.02330   1.00000  9.0 TiB  3.3 TiB   1.5 TiB  8.1
>>> MiB
>>> > >>>>> 4.5 GiB  5.8 TiB  36.09  1.04   20      up          osd.6
>>> > >>>>>  9    hdd    9.02330   1.00000  9.0 TiB  2.8 TiB   1.0 TiB  6.6
>>> MiB
>>> > >>>>> 3.8 GiB  6.2 TiB  30.97  0.89   23      up          osd.9
>>> > >>>>> 12    hdd    9.02330   1.00000  9.0 TiB  4.0 TiB   2.3 TiB   13
>>> MiB
>>> > >>>>> 6.1 GiB  5.0 TiB  44.68  1.28   30      up          osd.12
>>> > >>>>> 15    hdd    9.02330   1.00000  9.0 TiB  3.5 TiB   1.8 TiB  9.2
>>> MiB
>>> > >>>>> 5.2 GiB  5.5 TiB  38.99  1.12   30      up          osd.15
>>> > >>>>> 18    hdd    9.02330   1.00000  9.0 TiB  3.0 TiB   1.2 TiB  6.5
>>> MiB
>>> > >>>>> 4.0 GiB  6.1 TiB  32.80  0.94   21      up          osd.18
>>> > >>>>> 22    hdd    9.02330   1.00000  9.0 TiB  3.6 TiB   1.9 TiB   10
>>> MiB
>>> > >>>>> 5.4 GiB  5.4 TiB  40.25  1.15   22      up          osd.22
>>> > >>>>> 25    hdd    9.02330   1.00000  9.0 TiB  3.9 TiB   2.1 TiB   12
>>> MiB
>>> > >>>>> 5.7 GiB  5.1 TiB  42.94  1.23   22      up          osd.25
>>> > >>>>> 28    hdd    9.02330   1.00000  9.0 TiB  3.1 TiB   1.4 TiB  7.5
>>> MiB
>>> > >>>>> 4.1 GiB  5.9 TiB  34.87  1.00   21      up          osd.28
>>> > >>>>> 32    hdd    9.02330   1.00000  9.0 TiB  2.7 TiB  1017 GiB  4.8
>>> MiB
>>> > >>>>> 3.7 GiB  6.3 TiB  30.36  0.87   27      up          osd.32
>>> > >>>>> 35    hdd    9.02330   1.00000  9.0 TiB  3.0 TiB   1.3 TiB  7.2
>>> MiB
>>> > >>>>> 4.2 GiB  6.0 TiB  33.73  0.97   21      up          osd.35
>>> > >>>>> 38    hdd    9.02330   1.00000  9.0 TiB  3.1 TiB   1.4 TiB  7.3
>>> MiB
>>> > >>>>> 4.1 GiB  5.9 TiB  34.57  0.99   24      up          osd.38
>>> > >>>>> 41    hdd    9.02330   1.00000  9.0 TiB  2.9 TiB   1.2 TiB  6.2
>>> MiB
>>> > >>>>> 4.0 GiB  6.1 TiB  32.49  0.93   24      up          osd.41
>>> > >>>>> 44    hdd    9.02330   1.00000  9.0 TiB  3.1 TiB   1.4 TiB  7.3
>>> MiB
>>> > >>>>> 4.4 GiB  5.9 TiB  34.87  1.00   29      up          osd.44
>>> > >>>>> 47    hdd    9.02330   1.00000  9.0 TiB  2.7 TiB  1016 GiB  5.4
>>> MiB
>>> > >>>>> 3.6 GiB  6.3 TiB  30.35  0.87   23      up          osd.47
>>> > >>>>> -7         144.37280         -  144 TiB   50 TiB    22 TiB  122
>>> MiB
>>> > >>>>>  70 GiB   94 TiB  34.86  1.00    -              host ceph-osd2
>>> > >>>>>  1    hdd    9.02330   1.00000  9.0 TiB  2.8 TiB   1.1 TiB  5.7
>>> MiB
>>> > >>>>> 3.8 GiB  6.2 TiB  31.00  0.89   27      up          osd.1
>>> > >>>>>  5    hdd    9.02330   1.00000  9.0 TiB  3.2 TiB   1.5 TiB  7.3
>>> MiB
>>> > >>>>> 4.5 GiB  5.8 TiB  35.45  1.02   27      up          osd.5
>>> > >>>>>  8    hdd    9.02330   1.00000  9.0 TiB  3.3 TiB   1.6 TiB  8.3
>>> MiB
>>> > >>>>> 4.7 GiB  5.7 TiB  36.85  1.06   30      up          osd.8
>>> > >>>>> 10    hdd    9.02330   1.00000  9.0 TiB  3.1 TiB   1.4 TiB  7.5
>>> MiB
>>> > >>>>> 4.5 GiB  5.9 TiB  34.87  1.00   20      up          osd.10
>>> > >>>>> 13    hdd    9.02330   1.00000  9.0 TiB  3.6 TiB   1.8 TiB   10
>>> MiB
>>> > >>>>> 5.3 GiB  5.4 TiB  39.63  1.14   27      up          osd.13
>>> > >>>>> 16    hdd    9.02330   1.00000  9.0 TiB  2.8 TiB   1.1 TiB  6.0
>>> MiB
>>> > >>>>> 3.8 GiB  6.2 TiB  31.01  0.89   19      up          osd.16
>>> > >>>>> 19    hdd    9.02330   1.00000  9.0 TiB  3.0 TiB   1.2 TiB  6.4
>>> MiB
>>> > >>>>> 4.0 GiB  6.1 TiB  32.77  0.94   21      up          osd.19
>>> > >>>>> 21    hdd    9.02330   1.00000  9.0 TiB  2.8 TiB   1.1 TiB  5.5
>>> MiB
>>> > >>>>> 3.7 GiB  6.2 TiB  31.58  0.91   26      up          osd.21
>>> > >>>>> 24    hdd    9.02330   1.00000  9.0 TiB  2.6 TiB   855 GiB  4.7
>>> MiB
>>> > >>>>> 3.3 GiB  6.4 TiB  28.61  0.82   19      up          osd.24
>>> > >>>>> 27    hdd    9.02330   1.00000  9.0 TiB  3.7 TiB   1.9 TiB   10
>>> MiB
>>> > >>>>> 5.2 GiB  5.3 TiB  40.84  1.17   24      up          osd.27
>>> > >>>>> 30    hdd    9.02330   1.00000  9.0 TiB  3.2 TiB   1.4 TiB  7.5
>>> MiB
>>> > >>>>> 4.5 GiB  5.9 TiB  35.16  1.01   22      up          osd.30
>>> > >>>>> 33    hdd    9.02330   1.00000  9.0 TiB  3.1 TiB   1.4 TiB  8.6
>>> MiB
>>> > >>>>> 4.3 GiB  5.9 TiB  34.59  0.99   23      up          osd.33
>>> > >>>>> 36    hdd    9.02330   1.00000  9.0 TiB  3.4 TiB   1.7 TiB   10
>>> MiB
>>> > >>>>> 5.0 GiB  5.6 TiB  38.17  1.09   25      up          osd.36
>>> > >>>>> 39    hdd    9.02330   1.00000  9.0 TiB  3.4 TiB   1.7 TiB  8.5
>>> MiB
>>> > >>>>> 5.1 GiB  5.6 TiB  37.79  1.08   31      up          osd.39
>>> > >>>>> 42    hdd    9.02330   1.00000  9.0 TiB  3.6 TiB   1.8 TiB   10
>>> MiB
>>> > >>>>> 5.2 GiB  5.4 TiB  39.68  1.14   23      up          osd.42
>>> > >>>>> 45    hdd    9.02330   1.00000  9.0 TiB  2.7 TiB   964 GiB  5.1
>>> MiB
>>> > >>>>> 3.5 GiB  6.3 TiB  29.78  0.85   21      up          osd.45
>>> > >>>>> -5         144.37280         -  144 TiB   50 TiB    22 TiB  121
>>> MiB
>>> > >>>>>  70 GiB   94 TiB  34.86  1.00    -              host ceph-osd3
>>> > >>>>>  0    hdd    9.02330   1.00000  9.0 TiB  2.7 TiB   934 GiB  4.9
>>> MiB
>>> > >>>>> 3.4 GiB  6.4 TiB  29.47  0.85   21      up          osd.0
>>> > >>>>>  4    hdd    9.02330   1.00000  9.0 TiB  3.0 TiB   1.2 TiB  6.5
>>> MiB
>>> > >>>>> 4.1 GiB  6.1 TiB  32.73  0.94   22      up          osd.4
>>> > >>>>>  7    hdd    9.02330   1.00000  9.0 TiB  3.5 TiB   1.8 TiB  9.2
>>> MiB
>>> > >>>>> 5.1 GiB  5.5 TiB  39.02  1.12   30      up          osd.7
>>> > >>>>> 11    hdd    9.02330   1.00000  9.0 TiB  3.6 TiB   1.9 TiB   10
>>> MiB
>>> > >>>>> 5.1 GiB  5.4 TiB  39.97  1.15   27      up          osd.11
>>> > >>>>> 14    hdd    9.02330   1.00000  9.0 TiB  3.5 TiB   1.7 TiB   10
>>> MiB
>>> > >>>>> 5.1 GiB  5.6 TiB  38.24  1.10   27      up          osd.14
>>> > >>>>> 17    hdd    9.02330   1.00000  9.0 TiB  3.0 TiB   1.2 TiB  6.4
>>> MiB
>>> > >>>>> 4.1 GiB  6.0 TiB  33.09  0.95   23      up          osd.17
>>> > >>>>> 20    hdd    9.02330   1.00000  9.0 TiB  2.8 TiB   1.1 TiB  5.6
>>> MiB
>>> > >>>>> 3.8 GiB  6.2 TiB  31.55  0.90   20      up          osd.20
>>> > >>>>> 23    hdd    9.02330   1.00000  9.0 TiB  2.6 TiB   828 GiB  4.0
>>> MiB
>>> > >>>>> 3.3 GiB  6.5 TiB  28.32  0.81   23      up          osd.23
>>> > >>>>> 26    hdd    9.02330   1.00000  9.0 TiB  2.9 TiB   1.2 TiB  5.8
>>> MiB
>>> > >>>>> 3.8 GiB  6.1 TiB  32.12  0.92   26      up          osd.26
>>> > >>>>> 29    hdd    9.02330   1.00000  9.0 TiB  3.6 TiB   1.8 TiB   10
>>> MiB
>>> > >>>>> 5.1 GiB  5.4 TiB  39.73  1.14   24      up          osd.29
>>> > >>>>> 31    hdd    9.02330   1.00000  9.0 TiB  2.8 TiB   1.1 TiB  5.8
>>> MiB
>>> > >>>>> 3.7 GiB  6.2 TiB  31.56  0.91   22      up          osd.31
>>> > >>>>> 34    hdd    9.02330   1.00000  9.0 TiB  3.3 TiB   1.5 TiB  8.2
>>> MiB
>>> > >>>>> 4.6 GiB  5.7 TiB  36.29  1.04   23      up          osd.34
>>> > >>>>> 37    hdd    9.02330   1.00000  9.0 TiB  3.2 TiB   1.5 TiB  8.2
>>> MiB
>>> > >>>>> 4.5 GiB  5.8 TiB  35.51  1.02   20      up          osd.37
>>> > >>>>> 40    hdd    9.02330   1.00000  9.0 TiB  3.4 TiB   1.7 TiB  9.3
>>> MiB
>>> > >>>>> 4.9 GiB  5.6 TiB  38.16  1.09   25      up          osd.40
>>> > >>>>> 43    hdd    9.02330   1.00000  9.0 TiB  3.4 TiB   1.6 TiB  8.5
>>> MiB
>>> > >>>>> 4.8 GiB  5.7 TiB  37.19  1.07   29      up          osd.43
>>> > >>>>> 46    hdd    9.02330   1.00000  9.0 TiB  3.1 TiB   1.4 TiB  8.4
>>> MiB
>>> > >>>>> 4.4 GiB  5.9 TiB  34.85  1.00   23      up          osd.46
>>> > >>>>>                          TOTAL  433 TiB  151 TiB    67 TiB  364
>>> MiB
>>> > >>>>> 210 GiB  282 TiB  34.86
>>> > >>>>> MIN/MAX VAR: 0.81/1.28  STDDEV: 3.95
>>> > >>>>>
>>> > >>>>>
>>> > >>>>> Michel
>>> > >>>>>
>>> > >>>>>
>>> > >>>>> On Tue, Jan 30, 2024 at 4:18 PM Wesley Dillingham <
>>> > >>>>> wes@xxxxxxxxxxxxxxxxx> wrote:
>>> > >>>>>
>>> > >>>>>> I now concur you should increase the pg_num as a first step for
>>> this
>>> > >>>>>> cluster. Disable the pg autoscaler for and increase the volumes
>>> pool to
>>> > >>>>>> pg_num 256. Then likely re-asses and make the next power of 2
>>> jump to 512
>>> > >>>>>> and probably beyond.
>>> > >>>>>>
>>> > >>>>>> Keep in mind this is not going to fix your short term deep-scrub
>>> > >>>>>> issue in fact it will increase the number of not scrubbed in
>>> time PGs until
>>> > >>>>>> the pg_num change is complete.  This is because OSDs dont scrub
>>> when they
>>> > >>>>>> are backfilling.
>>> > >>>>>>
>>> > >>>>>> I would sit on 256 for a couple weeks and let scrubs happen then
>>> > >>>>>> continue past 256.
>>> > >>>>>>
>>> > >>>>>> with the ultimate target of around 100-200 PGs per OSD which
>>> "ceph
>>> > >>>>>> osd df tree" will show you in the PGs column.
>>> > >>>>>>
>>> > >>>>>> Respectfully,
>>> > >>>>>>
>>> > >>>>>> *Wes Dillingham*
>>> > >>>>>> wes@xxxxxxxxxxxxxxxxx
>>> > >>>>>> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>>> > >>>>>>
>>> > >>>>>>
>>> > >>>>>> On Tue, Jan 30, 2024 at 3:16 AM Michel Niyoyita <
>>> micou12@xxxxxxxxx>
>>> > >>>>>> wrote:
>>> > >>>>>>
>>> > >>>>>>> Dear team,
>>> > >>>>>>>
>>> > >>>>>>> below is the output of ceph df command and the ceph version I
>>> am
>>> > >>>>>>> running
>>> > >>>>>>>
>>> > >>>>>>>  ceph df
>>> > >>>>>>> --- RAW STORAGE ---
>>> > >>>>>>> CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
>>> > >>>>>>> hdd    433 TiB  282 TiB  151 TiB   151 TiB      34.82
>>> > >>>>>>> TOTAL  433 TiB  282 TiB  151 TiB   151 TiB      34.82
>>> > >>>>>>>
>>> > >>>>>>> --- POOLS ---
>>> > >>>>>>> POOL                   ID  PGS   STORED  OBJECTS     USED
>>> %USED
>>> > >>>>>>> MAX AVAIL
>>> > >>>>>>> device_health_metrics   1    1  1.1 MiB        3  3.2 MiB
>>> 0
>>> > >>>>>>>  73 TiB
>>> > >>>>>>> .rgw.root               2   32  3.7 KiB        8   96 KiB
>>> 0
>>> > >>>>>>>  73 TiB
>>> > >>>>>>> default.rgw.log         3   32  3.6 KiB      209  408 KiB
>>> 0
>>> > >>>>>>>  73 TiB
>>> > >>>>>>> default.rgw.control     4   32      0 B        8      0 B
>>> 0
>>> > >>>>>>>  73 TiB
>>> > >>>>>>> default.rgw.meta        5   32    382 B        2   24 KiB
>>> 0
>>> > >>>>>>>  73 TiB
>>> > >>>>>>> volumes                 6  128   21 TiB    5.68M   62 TiB
>>> 22.09
>>> > >>>>>>>  73 TiB
>>> > >>>>>>> images                  7   32  878 GiB  112.50k  2.6 TiB
>>>  1.17
>>> > >>>>>>>  73 TiB
>>> > >>>>>>> backups                 8   32      0 B        0      0 B
>>> 0
>>> > >>>>>>>  73 TiB
>>> > >>>>>>> vms                     9   32  881 GiB  174.30k  2.5 TiB
>>>  1.13
>>> > >>>>>>>  73 TiB
>>> > >>>>>>> testbench              10   32      0 B        0      0 B
>>> 0
>>> > >>>>>>>  73 TiB
>>> > >>>>>>> root@ceph-mon1:~# ceph --version
>>> > >>>>>>> ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894)
>>> > >>>>>>> pacific
>>> > >>>>>>> (stable)
>>> > >>>>>>> root@ceph-mon1:~#
>>> > >>>>>>>
>>> > >>>>>>> please advise accordingly
>>> > >>>>>>>
>>> > >>>>>>> Michel
>>> > >>>>>>>
>>> > >>>>>>> On Mon, Jan 29, 2024 at 9:48 PM Frank Schilder <frans@xxxxxx>
>>> wrote:
>>> > >>>>>>>
>>> > >>>>>>> > You will have to look at the output of "ceph df" and make a
>>> > >>>>>>> decision to
>>> > >>>>>>> > balance "objects per PG" and "GB per PG". Increase he PG
>>> count for
>>> > >>>>>>> the
>>> > >>>>>>> > pools with the worst of these two numbers most such that it
>>> > >>>>>>> balances out as
>>> > >>>>>>> > much as possible. If you have pools that see significantly
>>> more
>>> > >>>>>>> user-IO
>>> > >>>>>>> > than others, prioritise these.
>>> > >>>>>>> >
>>> > >>>>>>> > You will have to find out for your specific cluster, we can
>>> only
>>> > >>>>>>> give
>>> > >>>>>>> > general guidelines. Make changes, run benchmarks,
>>> re-evaluate.
>>> > >>>>>>> Take the
>>> > >>>>>>> > time for it. The better you know your cluster and your
>>> users, the
>>> > >>>>>>> better
>>> > >>>>>>> > the end result will be.
>>> > >>>>>>> >
>>> > >>>>>>> > Best regards,
>>> > >>>>>>> > =================
>>> > >>>>>>> > Frank Schilder
>>> > >>>>>>> > AIT Risø Campus
>>> > >>>>>>> > Bygning 109, rum S14
>>> > >>>>>>> >
>>> > >>>>>>> > ________________________________________
>>> > >>>>>>> > From: Michel Niyoyita <micou12@xxxxxxxxx>
>>> > >>>>>>> > Sent: Monday, January 29, 2024 2:04 PM
>>> > >>>>>>> > To: Janne Johansson
>>> > >>>>>>> > Cc: Frank Schilder; E Taka; ceph-users
>>> > >>>>>>> > Subject: Re:  Re: 6 pgs not deep-scrubbed in time
>>> > >>>>>>> >
>>> > >>>>>>> > This is how it is set , if you suggest to make some changes
>>> please
>>> > >>>>>>> advises.
>>> > >>>>>>> >
>>> > >>>>>>> > Thank you.
>>> > >>>>>>> >
>>> > >>>>>>> >
>>> > >>>>>>> > ceph osd pool ls detail
>>> > >>>>>>> > pool 1 'device_health_metrics' replicated size 3 min_size 2
>>> > >>>>>>> crush_rule 0
>>> > >>>>>>> > object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on
>>> > >>>>>>> last_change 1407
>>> > >>>>>>> > flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1
>>> > >>>>>>> application
>>> > >>>>>>> > mgr_devicehealth
>>> > >>>>>>> > pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 0
>>> > >>>>>>> object_hash
>>> > >>>>>>> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
>>> 1393
>>> > >>>>>>> flags
>>> > >>>>>>> > hashpspool stripe_width 0 application rgw
>>> > >>>>>>> > pool 3 'default.rgw.log' replicated size 3 min_size 2
>>> crush_rule 0
>>> > >>>>>>> > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
>>> > >>>>>>> last_change
>>> > >>>>>>> > 1394 flags hashpspool stripe_width 0 application rgw
>>> > >>>>>>> > pool 4 'default.rgw.control' replicated size 3 min_size 2
>>> > >>>>>>> crush_rule 0
>>> > >>>>>>> > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
>>> > >>>>>>> last_change
>>> > >>>>>>> > 1395 flags hashpspool stripe_width 0 application rgw
>>> > >>>>>>> > pool 5 'default.rgw.meta' replicated size 3 min_size 2
>>> crush_rule 0
>>> > >>>>>>> > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
>>> > >>>>>>> last_change
>>> > >>>>>>> > 1396 flags hashpspool stripe_width 0 pg_autoscale_bias 4
>>> > >>>>>>> application rgw
>>> > >>>>>>> > pool 6 'volumes' replicated size 3 min_size 2 crush_rule 0
>>> > >>>>>>> object_hash
>>> > >>>>>>> > rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change
>>> > >>>>>>> 108802 lfor
>>> > >>>>>>> > 0/0/14812 flags hashpspool,selfmanaged_snaps stripe_width 0
>>> > >>>>>>> application rbd
>>> > >>>>>>> >         removed_snaps_queue
>>> > >>>>>>> >
>>> > >>>>>>>
>>> [22d7~3,11561~2,11571~1,11573~1c,11594~6,1159b~f,115b0~1,115b3~1,115c3~1,115f3~1,115f5~e,11613~6,1161f~c,11637~1b,11660~1,11663~2,11673~1,116d1~c,116f5~10,11721~c]
>>> > >>>>>>> > pool 7 'images' replicated size 3 min_size 2 crush_rule 0
>>> > >>>>>>> object_hash
>>> > >>>>>>> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
>>> 94609
>>> > >>>>>>> flags
>>> > >>>>>>> > hashpspool,selfmanaged_snaps stripe_width 0 application rbd
>>> > >>>>>>> > pool 8 'backups' replicated size 3 min_size 2 crush_rule 0
>>> > >>>>>>> object_hash
>>> > >>>>>>> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
>>> 1399
>>> > >>>>>>> flags
>>> > >>>>>>> > hashpspool stripe_width 0 application rbd
>>> > >>>>>>> > pool 9 'vms' replicated size 3 min_size 2 crush_rule 0
>>> object_hash
>>> > >>>>>>> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
>>> 108783
>>> > >>>>>>> lfor
>>> > >>>>>>> > 0/561/559 flags hashpspool,selfmanaged_snaps stripe_width 0
>>> > >>>>>>> application rbd
>>> > >>>>>>> >         removed_snaps_queue [3fa~1,3fc~3,400~1,402~1]
>>> > >>>>>>> > pool 10 'testbench' replicated size 3 min_size 2 crush_rule 0
>>> > >>>>>>> object_hash
>>> > >>>>>>> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
>>> 20931
>>> > >>>>>>> lfor
>>> > >>>>>>> > 0/20931/20929 flags hashpspool stripe_width 0
>>> > >>>>>>> >
>>> > >>>>>>> >
>>> > >>>>>>> > On Mon, Jan 29, 2024 at 2:09 PM Michel Niyoyita <
>>> micou12@xxxxxxxxx
>>> > >>>>>>> <mailto:
>>> > >>>>>>> > micou12@xxxxxxxxx>> wrote:
>>> > >>>>>>> > Thank you Janne ,
>>> > >>>>>>> >
>>> > >>>>>>> > no need of setting some flags like ceph osd set
>>> nodeep-scrub  ???
>>> > >>>>>>> >
>>> > >>>>>>> > Thank you
>>> > >>>>>>> >
>>> > >>>>>>> > On Mon, Jan 29, 2024 at 2:04 PM Janne Johansson <
>>> > >>>>>>> icepic.dz@xxxxxxxxx
>>> > >>>>>>> > <mailto:icepic.dz@xxxxxxxxx>> wrote:
>>> > >>>>>>> > Den mån 29 jan. 2024 kl 12:58 skrev Michel Niyoyita <
>>> > >>>>>>> micou12@xxxxxxxxx
>>> > >>>>>>> > <mailto:micou12@xxxxxxxxx>>:
>>> > >>>>>>> > >
>>> > >>>>>>> > > Thank you Frank ,
>>> > >>>>>>> > >
>>> > >>>>>>> > > All disks are HDDs . Would like to know if I can increase
>>> the
>>> > >>>>>>> number of
>>> > >>>>>>> > PGs
>>> > >>>>>>> > > live in production without a negative impact to the
>>> cluster. if
>>> > >>>>>>> yes which
>>> > >>>>>>> > > commands to use .
>>> > >>>>>>> >
>>> > >>>>>>> > Yes. "ceph osd pool set <poolname> pg_num <number larger than
>>> > >>>>>>> before>"
>>> > >>>>>>> > where the number usually should be a power of two that leads
>>> to a
>>> > >>>>>>> > number of PGs per OSD between 100-200.
>>> > >>>>>>> >
>>> > >>>>>>> > --
>>> > >>>>>>> > May the most significant bit of your life be positive.
>>> > >>>>>>> >
>>> > >>>>>>> _______________________________________________
>>> > >>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> > >>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>> > >>>>>>>
>>> > >>>>>>
>>> > _______________________________________________
>>> > ceph-users mailing list -- ceph-users@xxxxxxx
>>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>
>>>
>>>
>>> --
>>> May the most significant bit of your life be positive.
>>>
>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx