Re: [Octopus] OSD overloading

Jack <ceph@xxxxxxxxxxxxxx> · Mon, 13 Apr 2020 00:01:50 +0200

Yep I am

The issue is solved now .. and by solved, brace yourselves, I mean I had
to recreate all OSDs

And this the cluster would not heal itself (because of the original
issue), I had to drop every rados pool, stop all OSDs, destroy &
recreate them ..
Yeah, well, hum

There is definitly an underlying issue there
Those OSDs were created and upgraded since Luminous

I have no more cue on the bug
Sadly, there is only so much downtime I can afford on this cluster

Anyway ..

On 4/9/20 4:51 AM, Ashley Merrick wrote:
> Are you sure your not being hit by:
> 
> 
> 
> ceph config set osd bluestore_fsck_quick_fix_on_mount false @ https://docs.ceph.com/docs/master/releases/octopus/
> 
> Have all your OSD's successfully completed the fsck?
> 
> 
> 
> Reasons I say that is I can see "20 OSD(s) reporting legacy (not per-pool) BlueStore omap usage stats"
> 
> 
> 
> 
> 
> ---- On Thu, 09 Apr 2020 02:15:02 +0800 Jack <mailto:ceph@xxxxxxxxxxxxxx> wrote ----
> 
> 
> 
> Just to confirm this does not get better: 
>  
> root@backup1:~# ceph status 
>  cluster: 
>  id:     9cd41f0f-936d-4b59-8e5d-9b679dae9140 
>  health: HEALTH_WARN 
>  20 OSD(s) reporting legacy (not per-pool) BlueStore omap 
> usage stats 
>  4/50952060 objects unfound (0.000%) 
>  nobackfill,norecover,noscrub,nodeep-scrub flag(s) set 
>  1 osds down 
>  3 nearfull osd(s) 
>  Reduced data availability: 826 pgs inactive, 616 pgs down, 
> 185 pgs peering, 158 pgs stale 
>  Low space hindering backfill (add storage if this doesn't 
> resolve itself): 93 pgs backfill_toofull 
>  Degraded data redundancy: 13285415/101904120 objects 
> degraded (13.037%), 706 pgs degraded, 696 pgs undersized 
>  989 pgs not deep-scrubbed in time 
>  378 pgs not scrubbed in time 
>  10 pool(s) nearfull 
>  2216 slow ops, oldest one blocked for 13905 sec, daemons 
> [osd.1,osd.11,osd.20,osd.24,osd.25,osd.29,osd.31,osd.37,osd.4,osd.5]... 
> have slow ops. 
>  
>  services: 
>  mon: 1 daemons, quorum backup1 (age 8d) 
>  mgr: backup1(active, since 8d) 
>  osd: 37 osds: 26 up (since 9m), 27 in (since 2h); 626 remapped pgs 
>  flags nobackfill,norecover,noscrub,nodeep-scrub 
>  rgw: 1 daemon active (backup1.odiso.net) 
>  
>  task status: 
>  
>  data: 
>  pools:   10 pools, 2785 pgs 
>  objects: 50.95M objects, 92 TiB 
>  usage:   121 TiB used, 39 TiB / 160 TiB avail 
>  pgs:     29.659% pgs not active 
>  13285415/101904120 objects degraded (13.037%) 
>  433992/101904120 objects misplaced (0.426%) 
>  4/50952060 objects unfound (0.000%) 
>  840 active+clean+snaptrim_wait 
>  536 down 
>  490 active+undersized+degraded+remapped+backfilling 
>  326 active+clean 
>  113 peering 
>  88  active+undersized+degraded 
>  83  active+undersized+degraded+remapped+backfill_toofull 
>  79  stale+down 
>  63  stale+peering 
>  51  active+clean+snaptrim 
>  24  activating 
>  22  active+recovering+degraded 
>  19  active+remapped+backfilling 
>  13  stale+active+undersized+degraded 
>  9   remapped+peering 
>  9   active+undersized+remapped+backfilling 
>  9 
> active+undersized+degraded+remapped+backfill_wait+backfill_toofull 
>  2   stale+active+clean+snaptrim 
>  2   active+undersized 
>  1   stale+active+clean+snaptrim_wait 
>  1   active+remapped+backfill_toofull 
>  1   active+clean+snaptrim_wait+laggy 
>  1   active+recovering+undersized+remapped 
>  1   down+remapped 
>  1   activating+undersized+degraded+remapped 
>  1   active+recovering+laggy 
>  
> On 4/8/20 3:27 PM, Jack wrote: 
>> The CPU is used by userspace, not kernelspace 
>>
>> Here is the perf top, see attachment 
>>
>> Rocksdb eats everything :/ 
>>
>>
>> On 4/8/20 3:14 PM, Paul Emmerich wrote: 
>>> What's the CPU busy with while spinning at 100%? 
>>>
>>> Check "perf top" for a quick overview 
>>>
>>>
>>> Paul 
>>>
>>
>>
>> _______________________________________________ 
>> ceph-users mailing list -- mailto:ceph-users@xxxxxxx 
>> To unsubscribe send an email to mailto:ceph-users-leave@xxxxxxx 
>>
> _______________________________________________ 
> ceph-users mailing list -- mailto:ceph-users@xxxxxxx 
> To unsubscribe send an email to mailto:ceph-users-leave@xxxxxxx
> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx