Re: PG backfilled slow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The Suse docs are pretty good for this:

https://www.suse.com/support/kb/doc/?id=000019693

basically up the osd-max-backfills / osd-recovery-max-active and this will allow concurrent backfills to the same device.  If you watch the OSD in grafana you should be able to see the underlying device utilisation and tune it until it's reasonably high but not falling over.   If you set it too high you are just going to end up with an OSD that continually restarts.
________________________________
From: Peter <petersun@xxxxxxxxxxxx>
Sent: 26 July 2023 17:19
To: ceph-users@xxxxxxx <ceph-users@xxxxxxx>
Subject:  PG backfilled slow

CAUTION: This email originates from outside THG

Hi all,

I need replace some disk due to bad sector. I have crush out these disks and ceph did backfilling and migrate data as I want. However, I could see these OSD has one or more PG left after a day wait and backfilling really slow. Now it has only one backfilling PG at the same time.

host001:~# ceph osd df
ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL     %USE   VAR   PGS  STATUS
122    hdd  9.37500         0      0 B      0 B      0 B      0 B      0 B       0 B      0     0    1      up
123    hdd  9.37500   1.00000  9.4 TiB  2.1 TiB  1.8 TiB  224 KiB  4.7 GiB   7.3 TiB  22.06  0.69   64      up
124    hdd  9.37500   1.00000  9.4 TiB  2.0 TiB  1.7 TiB  211 KiB  4.4 GiB   7.4 TiB  21.14  0.67   61      up
125    hdd  9.37500   1.00000  9.4 TiB  2.2 TiB  1.9 TiB  218 KiB  5.0 GiB   7.2 TiB  22.94  0.72   67      up
126    hdd  9.37500   1.00000  9.4 TiB  2.3 TiB  2.0 TiB  235 KiB  4.7 GiB   7.1 TiB  24.50  0.77   72      up
127    hdd  9.37500   1.00000  9.4 TiB  2.4 TiB  2.1 TiB  248 KiB  5.5 GiB   6.9 TiB  25.91  0.82   77      up
128    hdd  9.37500   1.00000  9.4 TiB  2.2 TiB  1.9 TiB  349 KiB  5.0 GiB   7.2 TiB  23.52  0.74   69      up
129    hdd  9.37500   1.00000  9.4 TiB  2.1 TiB  1.8 TiB  216 KiB  4.6 GiB   7.3 TiB  22.62  0.71   66      up
130    hdd  9.37500   1.00000  9.4 TiB  2.5 TiB  2.2 TiB  244 KiB  5.3 GiB   6.9 TiB  26.51  0.83   79      up
131    hdd  9.37500   1.00000  9.4 TiB  2.1 TiB  1.8 TiB  230 KiB  4.0 GiB   7.3 TiB  22.09  0.70   64      up
132    hdd  9.37500   1.00000  9.4 TiB  2.2 TiB  2.0 TiB  231 KiB  5.1 GiB   7.1 TiB  23.93  0.75   70      up
133    hdd  9.37500   1.00000  9.4 TiB  2.7 TiB  2.4 TiB  479 KiB  6.1 GiB   6.7 TiB  28.92  0.91   87      up
134    hdd  9.37500   1.00000  9.4 TiB  2.3 TiB  2.1 TiB  225 KiB  4.9 GiB   7.0 TiB  25.02  0.79   74      up
135    hdd  9.37500   1.00000  9.4 TiB  2.0 TiB  1.7 TiB  395 KiB  4.5 GiB   7.4 TiB  21.46  0.68   62      up
136    hdd  9.37500   1.00000  9.4 TiB  2.8 TiB  2.5 TiB  294 KiB  5.6 GiB   6.6 TiB  29.52  0.93   89      up
137    hdd  9.37500         0      0 B      0 B      0 B      0 B      0 B       0 B      0     0    2      up
138    hdd  9.37500         0      0 B      0 B      0 B      0 B      0 B       0 B      0     0    5      up
139    hdd  9.37500   1.00000  9.4 TiB  2.4 TiB  2.2 TiB  259 KiB  5.3 GiB   6.9 TiB  25.94  0.82   77      up
140    hdd  9.37500   1.00000  9.4 TiB  2.5 TiB  2.2 TiB  355 KiB  4.8 GiB   6.9 TiB  26.86  0.85   80      up
141    hdd  9.37500         0      0 B      0 B      0 B      0 B      0 B       0 B      0     0    1      up
142    hdd  9.37500   1.00000  9.4 TiB  2.6 TiB  2.3 TiB  1.6 GiB  4.9 GiB   6.8 TiB  27.43  0.86   83      up
143    hdd  9.37500   1.00000  9.4 TiB  2.7 TiB  2.4 TiB  276 KiB  5.7 GiB   6.7 TiB  28.64  0.90   86      up
144    hdd  9.37500   1.00000  9.4 TiB  2.5 TiB  2.2 TiB  256 KiB  5.5 GiB   6.9 TiB  26.77  0.84   80      up
145    hdd  9.37500   1.00000  9.4 TiB  2.3 TiB  2.0 TiB  248 KiB  5.0 GiB   7.1 TiB  24.46  0.77   72      up
146    hdd  9.37500         0      0 B      0 B      0 B      0 B      0 B       0 B      0     0    1      up
147    hdd  9.37500   1.00000  9.4 TiB  2.2 TiB  1.9 TiB  237 KiB  5.1 GiB   7.2 TiB  23.53  0.74   69      up
148    hdd  9.37500         0      0 B      0 B      0 B      0 B      0 B       0 B      0     0    1      up

host001:~# ceph pg dump_stuck
PG_STAT  STATE                          UP             UP_PRIMARY  ACTING         ACTING_PRIMARY
5.3fd    active+remapped+backfill_wait  [145,158,151]         145  [145,151,126]             145
5.3a8      active+remapped+backfilling  [136,133,158]         136  [136,133,167]             136
5.2e0    active+remapped+backfill_wait  [147,158,135]         147  [147,135,166]             147
5.294    active+remapped+backfill_wait  [147,128,164]         147  [147,128,138]             147
5.ef     active+remapped+backfill_wait  [123,134,158]         123  [123,148,137]             123
5.116    active+remapped+backfill_wait  [123,166,145]         123  [123,166,138]             123
5.1e8    active+remapped+backfill_wait  [127,158,157]         127  [127,157,161]             127
5.106    active+remapped+backfill_wait  [124,158,144]         124  [124,144,167]             124
5.1c     active+remapped+backfill_wait  [128,158,155]         128  [128,155,140]             128
5.2ef    active+remapped+backfill_wait  [128,163,153]         128  [128,163,137]             128
5.1e0    active+remapped+backfill_wait  [129,158,153]         129  [129,153,162]             129
5.1d2    active+remapped+backfill_wait  [128,168,149]         128  [128,168,146]             128
5.167    active+remapped+backfill_wait  [129,142,158]         129  [129,142,168]             129
5.f1     active+remapped+backfill_wait  [124,147,158]         124  [124,147,168]             124
5.2c     active+remapped+backfill_wait  [129,159,154]         129  [129,159,141]             129
5.12b    active+remapped+backfill_wait  [128,169,157]         128  [128,169,138]             128
5.3eb    active+remapped+backfill_wait  [136,158,149]         136  [136,149,127]             136
5.6e     active+remapped+backfill_wait  [136,168,152]         136  [136,168,122]             136
5.3d8    active+remapped+backfill_wait  [124,147,134]         124  [124,147,138]             124
5.b4     active+remapped+backfill_wait  [123,142,166]         123  [123,166,138]             123
5.1f5    active+remapped+backfill_wait  [145,153,158]         145  [145,153,169]             145
5.19c    active+remapped+backfill_wait  [129,158,151]         129  [129,151,164]             129
5.b3     active+remapped+backfill_wait  [124,143,158]         124  [124,143,155]             124
5.108    active+remapped+backfill_wait  [136,158,133]         136  [136,133,153]             136

Anyone can suggest what to do to fasten this process.

Thanks,
Peter
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux