Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on clients

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

I observed high latencies and mount points hanging since Octopus release and it's still observed on Pacific latest while draining OSD.

Cluster setup:

Ceph Pacific 16.2.7

Cephfs with EC data pool

EC profile setup:

crush-device-class=
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=10
m=2
plugin=jerasure
technique=reed_sol_van
w=8

Description:

If we have broken drive, we are removing it from Ceph cluster by draining it first. That means changing its crush weight to 0

ceph osd crush reweight osd.1 0

Normally on Nautilus it didn't affected clients. But after upgrade to Octopus (and since Octopus till current Pacific release) I can observe very high IO latencies on clients while OSD being drained (10sec and higher).

By debugging I found out that drained OSD is still listed as ACTIVE_PRIMARY and that happens only on EC pools and only since Octopus. I tested it back on Nautilus, to be sure, where behavior is correct and drained OSD is not listed under UP and ACTIVE OSDs for PGs.

Even if setting up primary-affinity for given OSD to 0 this doesn't have any effect on EC pool.

Bellow are my debugs:

Buggy behavior on Octopus and Pacific:

Before draining osd.70:

PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED  MISPLACED UNFOUND  BYTES       OMAP_BYTES*  OMAP_KEYS*  LOG   DISK_LOG STATE                          STATE_STAMP VERSION            REPORTED           UP UP_PRIMARY  ACTING                     ACTING_PRIMARY LAST_SCRUB         SCRUB_STAMP LAST_DEEP_SCRUB    DEEP_SCRUB_STAMP                 SNAPTRIMQ_LEN 16.1fff     2269                   0         0          0 0  8955297727            0           0  2449 2449                   active+clean 2022-05-19T08:41:55.241734+0200    19403690'275685 19407588:19607199    [70,206,216,375,307,57]          70 [70,206,216,375,307,57]              70    19384365'275621 2022-05-19T08:41:55.241493+0200    19384365'275621 2022-05-19T08:41:55.241493+0200              0
dumped pgs


after setting osd.70 crush weight to 0 (osd.70 is still acting primary):

 UP                         UP_PRIMARY ACTING                     ACTING_PRIMARY  LAST_SCRUB SCRUB_STAMP                      LAST_DEEP_SCRUB DEEP_SCRUB_STAMP                 SNAPTRIMQ_LEN 16.1fff     2269                   0         0       2269 0  8955297727            0           0  2449      2449 active+remapped+backfill_wait  2022-05-20T08:51:54.249071+0200 19403690'275685  19407668:19607289 [71,206,216,375,307,57]          71 [70,206,216,375,307,57]              70    19384365'275621 2022-05-19T08:41:55.241493+0200    19384365'275621 2022-05-19T08:41:55.241493+0200              0
dumped pgs


Correct behavior on Nautilus:

Before draining osd.10:

PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES    OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP                VERSION REPORTED UP         UP_PRIMARY ACTING     ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP           SNAPTRIMQ_LEN 2.4e          2                  0        0         0       0 8388608           0          0   2        2 active+clean 2022-05-20 02:13:47.432104    61'2    75:40   [10,0,7] 10   [10,0,7]             10        0'0 2022-05-20 01:44:36.217286             0'0 2022-05-20 01:44:36.217286             0

after setting osd.10 crush weight to 0 (behavior is correct, osd.10 is not listed, not used):


root@nautilus1:~# ceph pg dump pgs | head -2
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES     OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE                         STATE_STAMP                VERSION REPORTED UP         UP_PRIMARY ACTING     ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP                LAST_DEEP_SCRUB DEEP_SCRUB_STAMP           SNAPTRIMQ_LEN 2.4e         14                  0        0         0       0 58720256           0          0  18       18 active+clean 2022-05-20 02:18:59.414812   75'18    80:43 [22,0,7]         22   [22,0,7]             22        0'0 2022-05-20 01:44:36.217286             0'0 2022-05-20 01:44:36.217286             0


Now question is if is it some implemented feature?

Or is it a bug?

Thank you!

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux