Hi, no pool is EC. 20. 5. 2022 18:19:22 Dan van der Ster <dvanders@xxxxxxxxx>: > Hi, > > Just a curiosity... It looks like you're comparing an EC pool in octopus to a replicated pool in nautilus. Does primary affinity work for you in octopus on a replicated pool? And does a nautilus EC pool work? > > .. Dan > > > > On Fri., May 20, 2022, 13:53 Denis Polom, <denispolom@xxxxxxxxx> wrote: >> Hi >> >> I observed high latencies and mount points hanging since Octopus release >> and it's still observed on Pacific latest while draining OSD. >> >> Cluster setup: >> >> Ceph Pacific 16.2.7 >> >> Cephfs with EC data pool >> >> EC profile setup: >> >> crush-device-class= >> crush-failure-domain=host >> crush-root=default >> jerasure-per-chunk-alignment=false >> k=10 >> m=2 >> plugin=jerasure >> technique=reed_sol_van >> w=8 >> >> Description: >> >> If we have broken drive, we are removing it from Ceph cluster by >> draining it first. That means changing its crush weight to 0 >> >> ceph osd crush reweight osd.1 0 >> >> Normally on Nautilus it didn't affected clients. But after upgrade to >> Octopus (and since Octopus till current Pacific release) I can observe >> very high IO latencies on clients while OSD being drained (10sec and >> higher). >> >> By debugging I found out that drained OSD is still listed as >> ACTIVE_PRIMARY and that happens only on EC pools and only since Octopus. >> I tested it back on Nautilus, to be sure, where behavior is correct and >> drained OSD is not listed under UP and ACTIVE OSDs for PGs. >> >> Even if setting up primary-affinity for given OSD to 0 this doesn't have >> any effect on EC pool. >> >> Bellow are my debugs: >> >> Buggy behavior on Octopus and Pacific: >> >> Before draining osd.70: >> >> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND >> BYTES OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG >> STATE STATE_STAMP VERSION >> REPORTED UP UP_PRIMARY ACTING >> ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB >> DEEP_SCRUB_STAMP SNAPTRIMQ_LEN >> 16.1fff 2269 0 0 0 0 >> 8955297727 0 0 2449 2449 >> active+clean 2022-05-19T08:41:55.241734+0200 19403690'275685 >> 19407588:19607199 [70,206,216,375,307,57] 70 >> [70,206,216,375,307,57] 70 19384365'275621 >> 2022-05-19T08:41:55.241493+0200 19384365'275621 >> 2022-05-19T08:41:55.241493+0200 0 >> dumped pgs >> >> >> after setting osd.70 crush weight to 0 (osd.70 is still acting primary): >> >> UP UP_PRIMARY ACTING >> ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP >> LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN >> 16.1fff 2269 0 0 2269 0 >> 8955297727 0 0 2449 2449 >> active+remapped+backfill_wait 2022-05-20T08:51:54.249071+0200 >> 19403690'275685 19407668:19607289 [71,206,216,375,307,57] 71 >> [70,206,216,375,307,57] 70 19384365'275621 >> 2022-05-19T08:41:55.241493+0200 19384365'275621 >> 2022-05-19T08:41:55.241493+0200 0 >> dumped pgs >> >> >> Correct behavior on Nautilus: >> >> Before draining osd.10: >> >> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES >> OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP >> VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY >> LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP >> SNAPTRIMQ_LEN >> 2.4e 2 0 0 0 0 >> 8388608 0 0 2 2 active+clean 2022-05-20 >> 02:13:47.432104 61'2 75:40 [10,0,7] 10 [10,0,7] >> 10 0'0 2022-05-20 01:44:36.217286 0'0 2022-05-20 >> 01:44:36.217286 0 >> >> after setting osd.10 crush weight to 0 (behavior is correct, osd.10 is >> not listed, not used): >> >> >> root@nautilus1:~# ceph pg dump pgs | head -2 >> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES >> OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE >> STATE_STAMP VERSION REPORTED UP UP_PRIMARY >> ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP >> LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN >> 2.4e 14 0 0 0 0 >> 58720256 0 0 18 18 active+clean 2022-05-20 >> 02:18:59.414812 75'18 80:43 [22,0,7] 22 >> [22,0,7] 22 0'0 2022-05-20 >> 01:44:36.217286 0'0 2022-05-20 01:44:36.217286 0 >> >> >> Now question is if is it some implemented feature? >> >> Or is it a bug? >> >> Thank you! >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx