Re: Very slow backfilling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



root@hbgt-ceph1-mon3:/# ceph osd df
ID  CLASS  WEIGHT    REWEIGHT  SIZE     RAW USE   DATA     OMAP     META
 AVAIL    %USE   VAR   PGS  STATUS
 1    hdd  17.34140   1.00000   17 TiB   6.3 TiB  5.3 TiB   11 KiB   23
GiB   11 TiB  36.17  1.39   17      up
 3    hdd  17.34140   1.00000   17 TiB   4.9 TiB  4.0 TiB  3.7 GiB   17
GiB   12 TiB  28.47  1.09   11      up
 5    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB  3.2 GiB   12
GiB   14 TiB  20.89  0.80   13      up
 7    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  3.2 GiB  6.9
GiB   15 TiB  13.32  0.51   19      up
 9    hdd  17.34140   1.00000   17 TiB   4.9 TiB  4.0 TiB   68 MiB   18
GiB   12 TiB  28.53  1.09   18      up
11    hdd  17.34140   1.00000   17 TiB   6.3 TiB  5.3 TiB  403 MiB   23
GiB   11 TiB  36.13  1.38   17      up
13    hdd  17.34140   1.00000   17 TiB  1001 GiB  7.1 GiB  9.9 MiB  1.1
GiB   16 TiB   5.64  0.22   18      up
15    hdd  17.34140   1.00000   17 TiB   8.9 TiB  7.9 TiB  842 KiB   34
GiB  8.4 TiB  51.41  1.97   18      up
17    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB   24 KiB   12
GiB   14 TiB  20.90  0.80   17      up
19    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  4.1 GiB  6.2
GiB   15 TiB  13.31  0.51   18      up
21    hdd  17.34140   1.00000   17 TiB   5.0 TiB  4.0 TiB  206 MiB   17
GiB   12 TiB  28.55  1.09   23      up
23    hdd  17.34140   1.00000   17 TiB   4.9 TiB  4.0 TiB  4.2 GiB   17
GiB   12 TiB  28.54  1.09   14      up
 0    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB  7.2 GiB   12
GiB   14 TiB  20.94  0.80   18      up
 2    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB   18 KiB   12
GiB   14 TiB  20.93  0.80   13      up
 4    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB  3.0 GiB   12
GiB   14 TiB  20.95  0.80   20      up
 6    hdd  17.34140   1.00000   17 TiB   8.9 TiB  7.9 TiB  4.4 MiB   34
GiB  8.4 TiB  51.36  1.97   17      up
 8    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  965 KiB  6.5
GiB   15 TiB  13.26  0.51   14      up
10    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB   18 KiB  6.5
GiB   15 TiB  13.25  0.51   13      up
12    hdd  17.34140   1.00000   17 TiB   4.9 TiB  4.0 TiB   98 MiB   17
GiB   12 TiB  28.49  1.09   16      up
14    hdd  17.34140   1.00000   17 TiB   5.0 TiB  4.0 TiB  4.2 GiB   17
GiB   12 TiB  28.55  1.09   20      up
16    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB   24 KiB   12
GiB   14 TiB  20.94  0.80   20      up
18    hdd  17.34140   1.00000   17 TiB   8.9 TiB  7.9 TiB   17 MiB   34
GiB  8.4 TiB  51.42  1.97   19      up
20    hdd  17.34140   1.00000   17 TiB   4.9 TiB  4.0 TiB  3.2 GiB   17
GiB   12 TiB  28.50  1.09   18      up
22    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  2.7 GiB  6.2
GiB   15 TiB  13.25  0.51   11      up
24    hdd  17.34140   1.00000   17 TiB   4.9 TiB  4.0 TiB   70 MiB   17
GiB   12 TiB  28.50  1.09   18      up
25    hdd  17.34140   1.00000   17 TiB   4.9 TiB  4.0 TiB  3.0 GiB   17
GiB   12 TiB  28.51  1.09   16      up
26    hdd  17.34140   1.00000   17 TiB   6.3 TiB  5.3 TiB  3.0 GiB   23
GiB   11 TiB  36.13  1.38   15      up
27    hdd  17.34140   1.00000   17 TiB   5.0 TiB  4.0 TiB  205 MiB   17
GiB   12 TiB  28.59  1.10   16      up
28    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  1.0 MiB  6.3
GiB   15 TiB  13.27  0.51   12      up
29    hdd  17.34140   1.00000   17 TiB   4.9 TiB  4.0 TiB  1.3 MiB   17
GiB   12 TiB  28.50  1.09    4      up
30    hdd  17.34140   1.00000   17 TiB   6.3 TiB  5.3 TiB  379 KiB   23
GiB   11 TiB  36.14  1.38   16      up
31    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB  2.5 MiB   12
GiB   14 TiB  20.92  0.80   19      up
32    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB   11 MiB   12
GiB   14 TiB  20.93  0.80   16      up
33    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB   18 KiB   12
GiB   14 TiB  20.91  0.80   17      up
34    hdd  17.34140   1.00000   17 TiB   6.3 TiB  5.3 TiB   71 MiB   23
GiB   11 TiB  36.15  1.38   19      up
35    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  3.3 GiB  6.3
GiB   15 TiB  13.28  0.51   14      up
36    hdd  17.34140   1.00000   17 TiB   5.0 TiB  4.0 TiB      0 B   17
GiB   12 TiB  28.59  1.09   13      up
37    hdd  17.34140   1.00000   17 TiB   4.9 TiB  4.0 TiB   69 MiB   17
GiB   12 TiB  28.54  1.09   12      up
38    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  2.9 GiB  6.7
GiB   15 TiB  13.26  0.51   22      up
39    hdd  17.34140   1.00000   17 TiB   6.3 TiB  5.3 TiB  205 MiB   23
GiB   11 TiB  36.19  1.39   25      up
40    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB    9 KiB   12
GiB   14 TiB  20.88  0.80   14      up
41    hdd  17.34140   1.00000   17 TiB   6.3 TiB  5.3 TiB  8.2 GiB   23
GiB   11 TiB  36.11  1.38   20      up
42    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB   55 KiB   12
GiB   14 TiB  20.91  0.80   16      up
43    hdd  17.34140   1.00000   17 TiB   6.3 TiB  5.3 TiB   70 MiB   23
GiB   11 TiB  36.17  1.39   21      up
44    hdd  17.34140   1.00000   17 TiB   7.6 TiB  6.6 TiB   18 KiB   28
GiB  9.8 TiB  43.75  1.68   16      up
45    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  141 MiB  6.5
GiB   15 TiB  13.29  0.51   17      up
46    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  1.7 MiB  6.4
GiB   15 TiB  13.27  0.51   15      up
47    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB  3.5 GiB   11
GiB   14 TiB  20.89  0.80   22      up
48    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB    9 KiB  6.3
GiB   15 TiB  13.25  0.51   10      up
49    hdd  17.34140   1.00000   17 TiB   8.9 TiB  7.9 TiB    4 KiB   33
GiB  8.4 TiB  51.41  1.97   18      up
50    hdd  17.34140   1.00000   17 TiB   7.6 TiB  6.6 TiB  212 MiB   31
GiB  9.7 TiB  43.81  1.68   20      up
51    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.6 TiB   85 MiB   13
GiB   14 TiB  20.87  0.80   19      up
52    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  5.4 GiB  6.0
GiB   15 TiB  13.34  0.51   18      up
53    hdd  17.34140   1.00000   17 TiB   5.0 TiB  4.0 TiB   25 MiB   19
GiB   12 TiB  28.55  1.09   16      up
54    hdd  17.34140   1.00000   17 TiB   6.2 TiB  5.3 TiB  198 MiB   23
GiB   11 TiB  35.99  1.38   14      up
55    hdd  17.34140   1.00000   17 TiB   5.0 TiB  4.0 TiB   10 GiB   18
GiB   12 TiB  28.59  1.09   26      up
56    hdd  17.34140   1.00000   17 TiB   6.3 TiB  5.3 TiB  153 MiB   24
GiB   11 TiB  36.14  1.38   22      up
57    hdd  17.34140   1.00000   17 TiB   3.6 TiB  2.7 TiB   58 KiB   12
GiB   14 TiB  20.91  0.80   13      up
58    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB  3.3 GiB  6.4
GiB   15 TiB  13.23  0.51   11      up
59    hdd  17.34140   1.00000   17 TiB   2.3 TiB  1.3 TiB   19 KiB  6.3
GiB   15 TiB  13.27  0.51   11      up
                        TOTAL  1.0 PiB   272 TiB  213 TiB   84 GiB  942
GiB  769 TiB  26.11


root@hbgt-ceph1-mon3:/# ceph osd dump | grep pool
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash
rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 15503 lfor
0/8533/8531 flags hashpspool stripe_width 0 pg_num_min 1 application
mgr,mgr_devicehealth
pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash
rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 8321 lfor
0/8321/8319 flags hashpspool stripe_width 0 application rgw
pool 3 'bkp365-ncy.rgw.log' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
8297 lfor 0/8297/8295 flags hashpspool stripe_width 0 application rgw
pool 4 'bkp365-ncy.rgw.control' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
8054 lfor 0/8054/8052 flags hashpspool stripe_width 0 application rgw
pool 5 'bkp365-ncy.rgw.meta' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 3412
lfor 0/3412/3410 flags hashpspool stripe_width 0 pg_autoscale_bias 4
pg_num_min 8 application rgw
pool 6 'bkp365-ncy.rgw.buckets.data' erasure profile EC32 size 5 min_size 4
crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
last_change 3500 lfor 0/0/2720 flags hashpspool stripe_width 12288
application rgw
pool 7 'bkp365-ncy.rgw.buckets.index' replicated size 3 min_size 2
crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on
last_change 3436 lfor 0/3436/3434 flags hashpspool stripe_width 0
pg_autoscale_bias 4 pg_num_min 8 application rgw
pool 9 'ncy.rgw.buckets.data' erasure profile EC32 size 5 min_size 4
crush_rule 3 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
last_change 14975 lfor 0/0/14973 flags hashpspool stripe_width 12288
application rgw
pool 10 'ncy.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash
rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 14979 flags
hashpspool stripe_width 0 application rgw
pool 11 'ncy.rgw.control' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
14981 flags hashpspool stripe_width 0 application rgw
pool 12 'ncy.rgw.meta' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 15105
lfor 0/15105/15103 flags hashpspool stripe_width 0 pg_autoscale_bias 4
pg_num_min 8 application rgw
pool 13 'ncy.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 15236
lfor 0/15236/15234 flags hashpspool stripe_width 0 pg_autoscale_bias 4
pg_num_min 8 application rgw
pool 14 'ncy.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
15241 flags hashpspool stripe_width 0 application rgw

(EC32 is a erasure coding with 3 datas and 2 codes)

No output with "ceph osd pool autoscale-status"

Le jeu. 2 mars 2023 à 15:02, Curt <lightspd@xxxxxxxxx> a écrit :

> Forgot to do a reply all.
>
> What does
>
> ceph osd df
> ceph osd dump | grep pool return?
>
> Are you using auto scaling? 289pg with 272tb of data and 60 osds, that
> seems like 3-4 pg per osd at almost 1TB each. Unless I'm thinking of this
> wrong.
>
> On Thu, Mar 2, 2023, 17:37 Joffrey <joff.au@xxxxxxxxx> wrote:
>
>> My Ceph Version is 17.2.5 and all configuration about osd_scrub* are
>> defaults. I tried some updates on osd-max-backfills but no change.
>> I have many HDD with NVME for db and all are connected in a 25G network.
>>
>> Yes, it's the same PG since 4 days.
>>
>> I got a failure on a HDD and get many days of recovery+backfilling last  2
>> weeks.   Perhaps the 'not in time' warning is related to this.
>>
>> 'Jof
>>
>> Le jeu. 2 mars 2023 à 14:25, Anthony D'Atri <aad@xxxxxxxxxxxxxx> a écrit
>> :
>>
>> > Run `ceph health detail`.
>> >
>> > Is it the same PG backfilling for a long time, or a different one over
>> > time?
>> >
>> > That it’s remapped makes me think that what you’re seeing is the
>> balancer
>> > doing its job.
>> >
>> > As far as the scrubbing, do you limit the times when scrubbing can
>> happen?
>> > Are these HDDs? EC?
>> >
>> > > On Mar 2, 2023, at 07:20, Joffrey <joff.au@xxxxxxxxx> wrote:
>> > >
>> > > Hi,
>> > >
>> > > I have many 'not {deep-}scrubbed in time' and a1 PG
>> remapped+backfilling
>> > > and I don't understand why this backfilling is taking so long.
>> > >
>> > > root@hbgt-ceph1-mon3:/# ceph -s
>> > >  cluster:
>> > >    id:     c300532c-51fa-11ec-9a41-0050569c3b55
>> > >    health: HEALTH_WARN
>> > >            15 pgs not deep-scrubbed in time
>> > >            13 pgs not scrubbed in time
>> > >
>> > >  services:
>> > >    mon: 3 daemons, quorum
>> hbgt-ceph1-mon1,hbgt-ceph1-mon2,hbgt-ceph1-mon3
>> > > (age 36h)
>> > >    mgr: hbgt-ceph1-mon2.nteihj(active, since 2d), standbys:
>> > > hbgt-ceph1-mon1.thrnnu, hbgt-ceph1-mon3.gmfzqm
>> > >    osd: 60 osds: 60 up (since 13h), 60 in (since 13h); 1 remapped pgs
>> > >    rgw: 3 daemons active (3 hosts, 2 zones)
>> > >
>> > >  data:
>> > >    pools:   13 pools, 289 pgs
>> > >    objects: 67.74M objects, 127 TiB
>> > >    usage:   272 TiB used, 769 TiB / 1.0 PiB avail
>> > >    pgs:     288 active+clean
>> > >             1   active+remapped+backfilling
>> > >
>> > >  io:
>> > >    client:   3.3 KiB/s rd, 1.5 MiB/s wr, 3 op/s rd, 8 op/s wr
>> > >    recovery: 790 KiB/s, 0 objects/s
>> > >
>> > >
>> > > What can I do to understand this slow recovery (is it the backfill
>> > action ?)
>> > >
>> > > Thanks you
>> > >
>> > > 'Jof
>> > > _______________________________________________
>> > > ceph-users mailing list -- ceph-users@xxxxxxx
>> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >
>> >
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux