Re: 6 pgs not deep-scrubbed in time

Michel Niyoyita <micou12@xxxxxxxxx> · Tue, 30 Jan 2024 16:37:55 +0200

Thanks for your advices Wes, below is what ceph osd df tree shows , the
increase of pg_num of the production cluster will not affect the
performance or crush ? how long it can takes to finish?

ceph osd df tree
ID  CLASS  WEIGHT     REWEIGHT  SIZE     RAW USE  DATA      OMAP     META
   AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
-1         433.11841         -  433 TiB  151 TiB    67 TiB  364 MiB  210
GiB  282 TiB  34.86  1.00    -          root default
-3         144.37280         -  144 TiB   50 TiB    22 TiB  121 MiB   70
GiB   94 TiB  34.86  1.00    -              host ceph-osd1
 2    hdd    9.02330   1.00000  9.0 TiB  2.7 TiB  1021 GiB  5.4 MiB  3.7
GiB  6.3 TiB  30.40  0.87   19      up          osd.2
 3    hdd    9.02330   1.00000  9.0 TiB  2.7 TiB   931 GiB  4.1 MiB  3.5
GiB  6.4 TiB  29.43  0.84   29      up          osd.3
 6    hdd    9.02330   1.00000  9.0 TiB  3.3 TiB   1.5 TiB  8.1 MiB  4.5
GiB  5.8 TiB  36.09  1.04   20      up          osd.6
 9    hdd    9.02330   1.00000  9.0 TiB  2.8 TiB   1.0 TiB  6.6 MiB  3.8
GiB  6.2 TiB  30.97  0.89   23      up          osd.9
12    hdd    9.02330   1.00000  9.0 TiB  4.0 TiB   2.3 TiB   13 MiB  6.1
GiB  5.0 TiB  44.68  1.28   30      up          osd.12
15    hdd    9.02330   1.00000  9.0 TiB  3.5 TiB   1.8 TiB  9.2 MiB  5.2
GiB  5.5 TiB  38.99  1.12   30      up          osd.15
18    hdd    9.02330   1.00000  9.0 TiB  3.0 TiB   1.2 TiB  6.5 MiB  4.0
GiB  6.1 TiB  32.80  0.94   21      up          osd.18
22    hdd    9.02330   1.00000  9.0 TiB  3.6 TiB   1.9 TiB   10 MiB  5.4
GiB  5.4 TiB  40.25  1.15   22      up          osd.22
25    hdd    9.02330   1.00000  9.0 TiB  3.9 TiB   2.1 TiB   12 MiB  5.7
GiB  5.1 TiB  42.94  1.23   22      up          osd.25
28    hdd    9.02330   1.00000  9.0 TiB  3.1 TiB   1.4 TiB  7.5 MiB  4.1
GiB  5.9 TiB  34.87  1.00   21      up          osd.28
32    hdd    9.02330   1.00000  9.0 TiB  2.7 TiB  1017 GiB  4.8 MiB  3.7
GiB  6.3 TiB  30.36  0.87   27      up          osd.32
35    hdd    9.02330   1.00000  9.0 TiB  3.0 TiB   1.3 TiB  7.2 MiB  4.2
GiB  6.0 TiB  33.73  0.97   21      up          osd.35
38    hdd    9.02330   1.00000  9.0 TiB  3.1 TiB   1.4 TiB  7.3 MiB  4.1
GiB  5.9 TiB  34.57  0.99   24      up          osd.38
41    hdd    9.02330   1.00000  9.0 TiB  2.9 TiB   1.2 TiB  6.2 MiB  4.0
GiB  6.1 TiB  32.49  0.93   24      up          osd.41
44    hdd    9.02330   1.00000  9.0 TiB  3.1 TiB   1.4 TiB  7.3 MiB  4.4
GiB  5.9 TiB  34.87  1.00   29      up          osd.44
47    hdd    9.02330   1.00000  9.0 TiB  2.7 TiB  1016 GiB  5.4 MiB  3.6
GiB  6.3 TiB  30.35  0.87   23      up          osd.47
-7         144.37280         -  144 TiB   50 TiB    22 TiB  122 MiB   70
GiB   94 TiB  34.86  1.00    -              host ceph-osd2
 1    hdd    9.02330   1.00000  9.0 TiB  2.8 TiB   1.1 TiB  5.7 MiB  3.8
GiB  6.2 TiB  31.00  0.89   27      up          osd.1
 5    hdd    9.02330   1.00000  9.0 TiB  3.2 TiB   1.5 TiB  7.3 MiB  4.5
GiB  5.8 TiB  35.45  1.02   27      up          osd.5
 8    hdd    9.02330   1.00000  9.0 TiB  3.3 TiB   1.6 TiB  8.3 MiB  4.7
GiB  5.7 TiB  36.85  1.06   30      up          osd.8
10    hdd    9.02330   1.00000  9.0 TiB  3.1 TiB   1.4 TiB  7.5 MiB  4.5
GiB  5.9 TiB  34.87  1.00   20      up          osd.10
13    hdd    9.02330   1.00000  9.0 TiB  3.6 TiB   1.8 TiB   10 MiB  5.3
GiB  5.4 TiB  39.63  1.14   27      up          osd.13
16    hdd    9.02330   1.00000  9.0 TiB  2.8 TiB   1.1 TiB  6.0 MiB  3.8
GiB  6.2 TiB  31.01  0.89   19      up          osd.16
19    hdd    9.02330   1.00000  9.0 TiB  3.0 TiB   1.2 TiB  6.4 MiB  4.0
GiB  6.1 TiB  32.77  0.94   21      up          osd.19
21    hdd    9.02330   1.00000  9.0 TiB  2.8 TiB   1.1 TiB  5.5 MiB  3.7
GiB  6.2 TiB  31.58  0.91   26      up          osd.21
24    hdd    9.02330   1.00000  9.0 TiB  2.6 TiB   855 GiB  4.7 MiB  3.3
GiB  6.4 TiB  28.61  0.82   19      up          osd.24
27    hdd    9.02330   1.00000  9.0 TiB  3.7 TiB   1.9 TiB   10 MiB  5.2
GiB  5.3 TiB  40.84  1.17   24      up          osd.27
30    hdd    9.02330   1.00000  9.0 TiB  3.2 TiB   1.4 TiB  7.5 MiB  4.5
GiB  5.9 TiB  35.16  1.01   22      up          osd.30
33    hdd    9.02330   1.00000  9.0 TiB  3.1 TiB   1.4 TiB  8.6 MiB  4.3
GiB  5.9 TiB  34.59  0.99   23      up          osd.33
36    hdd    9.02330   1.00000  9.0 TiB  3.4 TiB   1.7 TiB   10 MiB  5.0
GiB  5.6 TiB  38.17  1.09   25      up          osd.36
39    hdd    9.02330   1.00000  9.0 TiB  3.4 TiB   1.7 TiB  8.5 MiB  5.1
GiB  5.6 TiB  37.79  1.08   31      up          osd.39
42    hdd    9.02330   1.00000  9.0 TiB  3.6 TiB   1.8 TiB   10 MiB  5.2
GiB  5.4 TiB  39.68  1.14   23      up          osd.42
45    hdd    9.02330   1.00000  9.0 TiB  2.7 TiB   964 GiB  5.1 MiB  3.5
GiB  6.3 TiB  29.78  0.85   21      up          osd.45
-5         144.37280         -  144 TiB   50 TiB    22 TiB  121 MiB   70
GiB   94 TiB  34.86  1.00    -              host ceph-osd3
 0    hdd    9.02330   1.00000  9.0 TiB  2.7 TiB   934 GiB  4.9 MiB  3.4
GiB  6.4 TiB  29.47  0.85   21      up          osd.0
 4    hdd    9.02330   1.00000  9.0 TiB  3.0 TiB   1.2 TiB  6.5 MiB  4.1
GiB  6.1 TiB  32.73  0.94   22      up          osd.4
 7    hdd    9.02330   1.00000  9.0 TiB  3.5 TiB   1.8 TiB  9.2 MiB  5.1
GiB  5.5 TiB  39.02  1.12   30      up          osd.7
11    hdd    9.02330   1.00000  9.0 TiB  3.6 TiB   1.9 TiB   10 MiB  5.1
GiB  5.4 TiB  39.97  1.15   27      up          osd.11
14    hdd    9.02330   1.00000  9.0 TiB  3.5 TiB   1.7 TiB   10 MiB  5.1
GiB  5.6 TiB  38.24  1.10   27      up          osd.14
17    hdd    9.02330   1.00000  9.0 TiB  3.0 TiB   1.2 TiB  6.4 MiB  4.1
GiB  6.0 TiB  33.09  0.95   23      up          osd.17
20    hdd    9.02330   1.00000  9.0 TiB  2.8 TiB   1.1 TiB  5.6 MiB  3.8
GiB  6.2 TiB  31.55  0.90   20      up          osd.20
23    hdd    9.02330   1.00000  9.0 TiB  2.6 TiB   828 GiB  4.0 MiB  3.3
GiB  6.5 TiB  28.32  0.81   23      up          osd.23
26    hdd    9.02330   1.00000  9.0 TiB  2.9 TiB   1.2 TiB  5.8 MiB  3.8
GiB  6.1 TiB  32.12  0.92   26      up          osd.26
29    hdd    9.02330   1.00000  9.0 TiB  3.6 TiB   1.8 TiB   10 MiB  5.1
GiB  5.4 TiB  39.73  1.14   24      up          osd.29
31    hdd    9.02330   1.00000  9.0 TiB  2.8 TiB   1.1 TiB  5.8 MiB  3.7
GiB  6.2 TiB  31.56  0.91   22      up          osd.31
34    hdd    9.02330   1.00000  9.0 TiB  3.3 TiB   1.5 TiB  8.2 MiB  4.6
GiB  5.7 TiB  36.29  1.04   23      up          osd.34
37    hdd    9.02330   1.00000  9.0 TiB  3.2 TiB   1.5 TiB  8.2 MiB  4.5
GiB  5.8 TiB  35.51  1.02   20      up          osd.37
40    hdd    9.02330   1.00000  9.0 TiB  3.4 TiB   1.7 TiB  9.3 MiB  4.9
GiB  5.6 TiB  38.16  1.09   25      up          osd.40
43    hdd    9.02330   1.00000  9.0 TiB  3.4 TiB   1.6 TiB  8.5 MiB  4.8
GiB  5.7 TiB  37.19  1.07   29      up          osd.43
46    hdd    9.02330   1.00000  9.0 TiB  3.1 TiB   1.4 TiB  8.4 MiB  4.4
GiB  5.9 TiB  34.85  1.00   23      up          osd.46
                         TOTAL  433 TiB  151 TiB    67 TiB  364 MiB  210
GiB  282 TiB  34.86
MIN/MAX VAR: 0.81/1.28  STDDEV: 3.95

Michel

On Tue, Jan 30, 2024 at 4:18 PM Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx>
wrote:

> I now concur you should increase the pg_num as a first step for this
> cluster. Disable the pg autoscaler for and increase the volumes pool to
> pg_num 256. Then likely re-asses and make the next power of 2 jump to 512
> and probably beyond.
>
> Keep in mind this is not going to fix your short term deep-scrub issue in
> fact it will increase the number of not scrubbed in time PGs until the
> pg_num change is complete.  This is because OSDs dont scrub when they are
> backfilling.
>
> I would sit on 256 for a couple weeks and let scrubs happen then continue
> past 256.
>
> with the ultimate target of around 100-200 PGs per OSD which "ceph osd df
> tree" will show you in the PGs column.
>
> Respectfully,
>
> *Wes Dillingham*
> wes@xxxxxxxxxxxxxxxxx
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Tue, Jan 30, 2024 at 3:16 AM Michel Niyoyita <micou12@xxxxxxxxx> wrote:
>
>> Dear team,
>>
>> below is the output of ceph df command and the ceph version I am running
>>
>>  ceph df
>> --- RAW STORAGE ---
>> CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
>> hdd    433 TiB  282 TiB  151 TiB   151 TiB      34.82
>> TOTAL  433 TiB  282 TiB  151 TiB   151 TiB      34.82
>>
>> --- POOLS ---
>> POOL                   ID  PGS   STORED  OBJECTS     USED  %USED  MAX
>> AVAIL
>> device_health_metrics   1    1  1.1 MiB        3  3.2 MiB      0     73
>> TiB
>> .rgw.root               2   32  3.7 KiB        8   96 KiB      0     73
>> TiB
>> default.rgw.log         3   32  3.6 KiB      209  408 KiB      0     73
>> TiB
>> default.rgw.control     4   32      0 B        8      0 B      0     73
>> TiB
>> default.rgw.meta        5   32    382 B        2   24 KiB      0     73
>> TiB
>> volumes                 6  128   21 TiB    5.68M   62 TiB  22.09     73
>> TiB
>> images                  7   32  878 GiB  112.50k  2.6 TiB   1.17     73
>> TiB
>> backups                 8   32      0 B        0      0 B      0     73
>> TiB
>> vms                     9   32  881 GiB  174.30k  2.5 TiB   1.13     73
>> TiB
>> testbench              10   32      0 B        0      0 B      0     73
>> TiB
>> root@ceph-mon1:~# ceph --version
>> ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific
>> (stable)
>> root@ceph-mon1:~#
>>
>> please advise accordingly
>>
>> Michel
>>
>> On Mon, Jan 29, 2024 at 9:48 PM Frank Schilder <frans@xxxxxx> wrote:
>>
>> > You will have to look at the output of "ceph df" and make a decision to
>> > balance "objects per PG" and "GB per PG". Increase he PG count for the
>> > pools with the worst of these two numbers most such that it balances
>> out as
>> > much as possible. If you have pools that see significantly more user-IO
>> > than others, prioritise these.
>> >
>> > You will have to find out for your specific cluster, we can only give
>> > general guidelines. Make changes, run benchmarks, re-evaluate. Take the
>> > time for it. The better you know your cluster and your users, the better
>> > the end result will be.
>> >
>> > Best regards,
>> > =================
>> > Frank Schilder
>> > AIT Risø Campus
>> > Bygning 109, rum S14
>> >
>> > ________________________________________
>> > From: Michel Niyoyita <micou12@xxxxxxxxx>
>> > Sent: Monday, January 29, 2024 2:04 PM
>> > To: Janne Johansson
>> > Cc: Frank Schilder; E Taka; ceph-users
>> > Subject: Re:  Re: 6 pgs not deep-scrubbed in time
>> >
>> > This is how it is set , if you suggest to make some changes please
>> advises.
>> >
>> > Thank you.
>> >
>> >
>> > ceph osd pool ls detail
>> > pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0
>> > object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change
>> 1407
>> > flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application
>> > mgr_devicehealth
>> > pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash
>> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 1393 flags
>> > hashpspool stripe_width 0 application rgw
>> > pool 3 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0
>> > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
>> > 1394 flags hashpspool stripe_width 0 application rgw
>> > pool 4 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0
>> > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
>> > 1395 flags hashpspool stripe_width 0 application rgw
>> > pool 5 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0
>> > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
>> > 1396 flags hashpspool stripe_width 0 pg_autoscale_bias 4 application rgw
>> > pool 6 'volumes' replicated size 3 min_size 2 crush_rule 0 object_hash
>> > rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 108802
>> lfor
>> > 0/0/14812 flags hashpspool,selfmanaged_snaps stripe_width 0 application
>> rbd
>> >         removed_snaps_queue
>> >
>> [22d7~3,11561~2,11571~1,11573~1c,11594~6,1159b~f,115b0~1,115b3~1,115c3~1,115f3~1,115f5~e,11613~6,1161f~c,11637~1b,11660~1,11663~2,11673~1,116d1~c,116f5~10,11721~c]
>> > pool 7 'images' replicated size 3 min_size 2 crush_rule 0 object_hash
>> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 94609 flags
>> > hashpspool,selfmanaged_snaps stripe_width 0 application rbd
>> > pool 8 'backups' replicated size 3 min_size 2 crush_rule 0 object_hash
>> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 1399 flags
>> > hashpspool stripe_width 0 application rbd
>> > pool 9 'vms' replicated size 3 min_size 2 crush_rule 0 object_hash
>> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 108783 lfor
>> > 0/561/559 flags hashpspool,selfmanaged_snaps stripe_width 0 application
>> rbd
>> >         removed_snaps_queue [3fa~1,3fc~3,400~1,402~1]
>> > pool 10 'testbench' replicated size 3 min_size 2 crush_rule 0
>> object_hash
>> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 20931 lfor
>> > 0/20931/20929 flags hashpspool stripe_width 0
>> >
>> >
>> > On Mon, Jan 29, 2024 at 2:09 PM Michel Niyoyita <micou12@xxxxxxxxx
>> <mailto:
>> > micou12@xxxxxxxxx>> wrote:
>> > Thank you Janne ,
>> >
>> > no need of setting some flags like ceph osd set nodeep-scrub  ???
>> >
>> > Thank you
>> >
>> > On Mon, Jan 29, 2024 at 2:04 PM Janne Johansson <icepic.dz@xxxxxxxxx
>> > <mailto:icepic.dz@xxxxxxxxx>> wrote:
>> > Den mån 29 jan. 2024 kl 12:58 skrev Michel Niyoyita <micou12@xxxxxxxxx
>> > <mailto:micou12@xxxxxxxxx>>:
>> > >
>> > > Thank you Frank ,
>> > >
>> > > All disks are HDDs . Would like to know if I can increase the number
>> of
>> > PGs
>> > > live in production without a negative impact to the cluster. if yes
>> which
>> > > commands to use .
>> >
>> > Yes. "ceph osd pool set <poolname> pg_num <number larger than before>"
>> > where the number usually should be a power of two that leads to a
>> > number of PGs per OSD between 100-200.
>> >
>> > --
>> > May the most significant bit of your life be positive.
>> >
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx