Re: [Octopus] OSD overloading

Jack <ceph@xxxxxxxxxxxxxx> · Wed, 8 Apr 2020 20:15:02 +0200

Just to confirm this does not get better:

root@backup1:~# ceph status
  cluster:
    id:     9cd41f0f-936d-4b59-8e5d-9b679dae9140
    health: HEALTH_WARN
            20 OSD(s) reporting legacy (not per-pool) BlueStore omap
usage stats
            4/50952060 objects unfound (0.000%)
            nobackfill,norecover,noscrub,nodeep-scrub flag(s) set
            1 osds down
            3 nearfull osd(s)
            Reduced data availability: 826 pgs inactive, 616 pgs down,
185 pgs peering, 158 pgs stale
            Low space hindering backfill (add storage if this doesn't
resolve itself): 93 pgs backfill_toofull
            Degraded data redundancy: 13285415/101904120 objects
degraded (13.037%), 706 pgs degraded, 696 pgs undersized
            989 pgs not deep-scrubbed in time
            378 pgs not scrubbed in time
            10 pool(s) nearfull
            2216 slow ops, oldest one blocked for 13905 sec, daemons
[osd.1,osd.11,osd.20,osd.24,osd.25,osd.29,osd.31,osd.37,osd.4,osd.5]...
have slow ops.

  services:
    mon: 1 daemons, quorum backup1 (age 8d)
    mgr: backup1(active, since 8d)
    osd: 37 osds: 26 up (since 9m), 27 in (since 2h); 626 remapped pgs
         flags nobackfill,norecover,noscrub,nodeep-scrub
    rgw: 1 daemon active (backup1.odiso.net)

  task status:

  data:
    pools:   10 pools, 2785 pgs
    objects: 50.95M objects, 92 TiB
    usage:   121 TiB used, 39 TiB / 160 TiB avail
    pgs:     29.659% pgs not active
             13285415/101904120 objects degraded (13.037%)
             433992/101904120 objects misplaced (0.426%)
             4/50952060 objects unfound (0.000%)
             840 active+clean+snaptrim_wait
             536 down
             490 active+undersized+degraded+remapped+backfilling
             326 active+clean
             113 peering
             88  active+undersized+degraded
             83  active+undersized+degraded+remapped+backfill_toofull
             79  stale+down
             63  stale+peering
             51  active+clean+snaptrim
             24  activating
             22  active+recovering+degraded
             19  active+remapped+backfilling
             13  stale+active+undersized+degraded
             9   remapped+peering
             9   active+undersized+remapped+backfilling
             9
active+undersized+degraded+remapped+backfill_wait+backfill_toofull
             2   stale+active+clean+snaptrim
             2   active+undersized
             1   stale+active+clean+snaptrim_wait
             1   active+remapped+backfill_toofull
             1   active+clean+snaptrim_wait+laggy
             1   active+recovering+undersized+remapped
             1   down+remapped
             1   activating+undersized+degraded+remapped
             1   active+recovering+laggy

On 4/8/20 3:27 PM, Jack wrote:
> The CPU is used by userspace, not kernelspace
> 
> Here is the perf top, see attachment
> 
> Rocksdb eats everything :/
> 
> 
> On 4/8/20 3:14 PM, Paul Emmerich wrote:
>> What's the CPU busy with while spinning at 100%?
>>
>> Check "perf top" for a quick overview
>>
>>
>> Paul
>>
> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx