Re: Large number of misplaced PGs but little backfill going on

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 24-03-2024 13:41, Tyler Stachecki wrote:
On Sat, Mar 23, 2024, 4:26 AM Torkil Svensgaard <torkil@xxxxxxxx> wrote:

Hi

... Using mclock with high_recovery_ops profile.

What is the bottleneck here? I would have expected a huge number of
simultaneous backfills. Backfill reservation logjam?


mClock is very buggy in my experience and frequently leads to issues like
this. Try using regular backfill and see if the problem goes away.

Hi Tyler

Just tried switching to wpq, same thing.

I'm inclined to think is must be a read reservation logjam of some sort, given that increasing osd_max_backfills had an immediate effect and we have 4 empty hosts as main write targets. Here's the output for one such OSD from the script Alexander linked:

"
osd.539: gimpy =>0 0.0B <=42 1.54T (Δ1.54T) drive=9.0% 358.72G/3.90T crush=9
.0% 358.72G/3.90T
  <-11.54f          waiting   44.8G  539<-421  1070 of 230320, 0.5%
  <-37.538          waiting   22.3G  539<-121  2819 of 556087, 0.5%
  <-11.507          waiting   45.1G  539<-61   912 of 139227, 0.7%
  <-37.450          waiting   22.3G  539<-220  1235 of 632776, 0.2%
  <-11.458          waiting   45.3G  539<-121  178 of 279150, 0.1%
  <-37.47c          waiting   22.2G  539<-83   2281 of 634472, 0.4%
  <-37.434          waiting   22.1G  539<-78   9496 of 316052, 3.0%
  <-11.3f3          waiting   44.9G  539<-109  2375 of 231055, 1.0%
  <-37.3d3          waiting   22.0G  539<-73   2144 of 316508, 0.7%
  <-37.3c5          waiting   22.2G  539<-83   313880 of 313880, 100.0%
  <-11.3c1          waiting   44.8G  539<-223  93878 of 230270, 40.8%
  <-37.3a4          waiting   21.9G  539<-85   4604 of 315504, 1.5%
  <-11.344          waiting   44.5G  539<-63   728 of 182876, 0.4%
  <-6.1ca           waiting  100.9G  539<-443  36076 of 56270, 64.1%
  <-4.1a2           waiting  157.6G  539<-218  508 of 91456, 0.6%
  <-37.1ba          waiting   22.2G  539<-64   316848 of 316848, 100.0%
  <-37.84           waiting   22.0G  539<-33   4380 of 237633, 1.8%
  <-37.ad           waiting   22.2G  539<-77   6730 of 396635, 1.7%
  <-37.36           waiting   22.1G  539<-47   2170 of 395955, 0.5%
  <-11.b9           waiting   45.1G  539<-223  0 of 231940, 0.0%
  <-37.11c          waiting   22.1G  539<-33   9952 of 316448, 3.1%
  <-11.144          waiting   45.1G  539<-207  528 of 278094, 0.2%
  <-37.2ae          waiting   22.1G  539<-224  2565 of 712539, 0.4%
  <-37.285          waiting   22.0G  539<-65   441 of 315336, 0.1%
  <-37.2ef          waiting   22.0G  539<-414  2124 of 475410, 0.4%
  <-37.674          waiting   22.0G  539<-56   60 of 236511, 0.0%
  <-37.655          waiting   22.3G  539<-143  237316 of 237381, 100.0%
  <-11.6b0          waiting   44.9G  539<-282  1131 of 277122, 0.4%
  <-37.71a          waiting   22.2G  539<-49   82865 of 315684, 26.2%
  <-11.789          waiting   45.0G  539<-196  736 of 277584, 0.3%
  <-11.7cf          waiting   44.8G  539<-127  143 of 276582, 0.1%
  <-11.7f2          waiting   45.2G  539<-272  145857 of 185680, 78.6%
  <-37.7dd          waiting   22.0G  539<-72   0 of 393475, 0.0%
  <-37.7d9          waiting   22.2G  539<-37   930 of 237831, 0.4%
  <-11.7fb          waiting   45.2G  539<-78   1062 of 279042, 0.4%
  <-37.7d2          waiting   22.0G  539<-71   2409 of 631368, 0.4%
  <-11.8db          waiting   44.9G  539<-84   108 of 277182, 0.0%
  <-11.9b6          waiting   44.8G  539<-74   772 of 184432, 0.4%
  <-11.b0b          waiting   45.0G  539<-166  2569 of 231430, 1.1%
  <-11.d42          waiting   45.2G  539<-118  15428 of 46429, 33.2%
  <-11.d5f          waiting   44.8G  539<-64   4 of 184356, 0.0%
  <-11.d98          waiting   45.1G  539<-418  0 of 278568, 0.0%
"

All waiting for something.

Mvh.

Torkil

Tyler


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Torkil Svensgaard
Systems Administrator
Danish Research Centre for Magnetic Resonance DRCMR, Section 714
Copenhagen University Hospital Amager and Hvidovre
Kettegaard Allé 30, 2650 Hvidovre, Denmark
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux