Re: Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on clients

denispolom@xxxxxxxxx · Fri, 20 May 2022 17:25:39 +0000 (UTC)

Hi,

no pool is EC.

20. 5. 2022 18:19:22 Dan van der Ster <dvanders@xxxxxxxxx>:

> Hi,
> 
> Just a curiosity... It looks like you're comparing an EC pool in octopus to a replicated pool in nautilus. Does primary affinity work for you in octopus on a replicated pool? And does a nautilus EC pool work?
> 
> .. Dan
> 
> 
> 
> On Fri., May 20, 2022, 13:53 Denis Polom, <denispolom@xxxxxxxxx> wrote:
>> Hi
>> 
>> I observed high latencies and mount points hanging since Octopus release
>> and it's still observed on Pacific latest while draining OSD.
>> 
>> Cluster setup:
>> 
>> Ceph Pacific 16.2.7
>> 
>> Cephfs with EC data pool
>> 
>> EC profile setup:
>> 
>> crush-device-class=
>> crush-failure-domain=host
>> crush-root=default
>> jerasure-per-chunk-alignment=false
>> k=10
>> m=2
>> plugin=jerasure
>> technique=reed_sol_van
>> w=8
>> 
>> Description:
>> 
>> If we have broken drive, we are removing it from Ceph cluster by
>> draining it first. That means changing its crush weight to 0
>> 
>> ceph osd crush reweight osd.1 0
>> 
>> Normally on Nautilus it didn't affected clients. But after upgrade to
>> Octopus (and since Octopus till current Pacific release) I can observe
>> very high IO latencies on clients while OSD being drained (10sec and
>> higher).
>> 
>> By debugging I found out that drained OSD is still listed as
>> ACTIVE_PRIMARY and that happens only on EC pools and only since Octopus.
>> I tested it back on Nautilus, to be sure, where behavior is correct and
>> drained OSD is not listed under UP and ACTIVE OSDs for PGs.
>> 
>> Even if setting up primary-affinity for given OSD to 0 this doesn't have
>> any effect on EC pool.
>> 
>> Bellow are my debugs:
>> 
>> Buggy behavior on Octopus and Pacific:
>> 
>> Before draining osd.70:
>> 
>> PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED  MISPLACED UNFOUND 
>> BYTES       OMAP_BYTES*  OMAP_KEYS*  LOG   DISK_LOG
>> STATE                          STATE_STAMP VERSION           
>> REPORTED           UP UP_PRIMARY  ACTING                    
>> ACTING_PRIMARY LAST_SCRUB         SCRUB_STAMP LAST_DEEP_SCRUB   
>> DEEP_SCRUB_STAMP                 SNAPTRIMQ_LEN
>> 16.1fff     2269                   0         0          0 0 
>> 8955297727            0           0  2449 2449                  
>> active+clean 2022-05-19T08:41:55.241734+0200    19403690'275685
>> 19407588:19607199    [70,206,216,375,307,57]          70
>> [70,206,216,375,307,57]              70    19384365'275621
>> 2022-05-19T08:41:55.241493+0200    19384365'275621
>> 2022-05-19T08:41:55.241493+0200              0
>> dumped pgs
>> 
>> 
>> after setting osd.70 crush weight to 0 (osd.70 is still acting primary):
>> 
>>   UP                         UP_PRIMARY ACTING                    
>> ACTING_PRIMARY  LAST_SCRUB SCRUB_STAMP                     
>> LAST_DEEP_SCRUB DEEP_SCRUB_STAMP                 SNAPTRIMQ_LEN
>> 16.1fff     2269                   0         0       2269 0 
>> 8955297727            0           0  2449      2449
>> active+remapped+backfill_wait  2022-05-20T08:51:54.249071+0200
>> 19403690'275685  19407668:19607289 [71,206,216,375,307,57]          71
>> [70,206,216,375,307,57]              70    19384365'275621
>> 2022-05-19T08:41:55.241493+0200    19384365'275621
>> 2022-05-19T08:41:55.241493+0200              0
>> dumped pgs
>> 
>> 
>> Correct behavior on Nautilus:
>> 
>> Before draining osd.10:
>> 
>> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES   
>> OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP               
>> VERSION REPORTED UP         UP_PRIMARY ACTING     ACTING_PRIMARY
>> LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP          
>> SNAPTRIMQ_LEN
>> 2.4e          2                  0        0         0       0
>> 8388608           0          0   2        2 active+clean 2022-05-20
>> 02:13:47.432104    61'2    75:40   [10,0,7] 10   [10,0,7]            
>> 10        0'0 2022-05-20 01:44:36.217286             0'0 2022-05-20
>> 01:44:36.217286             0
>> 
>> after setting osd.10 crush weight to 0 (behavior is correct, osd.10 is
>> not listed, not used):
>> 
>> 
>> root@nautilus1:~# ceph pg dump pgs | head -2
>> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES    
>> OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE                        
>> STATE_STAMP                VERSION REPORTED UP         UP_PRIMARY
>> ACTING     ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP               
>> LAST_DEEP_SCRUB DEEP_SCRUB_STAMP           SNAPTRIMQ_LEN
>> 2.4e         14                  0        0         0       0
>> 58720256           0          0  18       18 active+clean 2022-05-20
>> 02:18:59.414812   75'18    80:43 [22,0,7]         22  
>> [22,0,7]             22        0'0 2022-05-20
>> 01:44:36.217286             0'0 2022-05-20 01:44:36.217286             0
>> 
>> 
>> Now question is if is it some implemented feature?
>> 
>> Or is it a bug?
>> 
>> Thank you!
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx