Are you sure your not being hit by:
ceph config set osd bluestore_fsck_quick_fix_on_mount false @ https://docs.ceph.com/docs/master/releases/octopus/
Have all your OSD's successfully completed the fsck?
Reasons I say that is I can see "20 OSD(s) reporting legacy (not per-pool) BlueStore omap usage stats"
---- On Thu, 09 Apr 2020 02:15:02 +0800 Jack <mailto:ceph@xxxxxxxxxxxxxx> wrote ----
Just to confirm this does not get better:
root@backup1:~# ceph status
cluster:
id: 9cd41f0f-936d-4b59-8e5d-9b679dae9140
health: HEALTH_WARN
20 OSD(s) reporting legacy (not per-pool) BlueStore omap
usage stats
4/50952060 objects unfound (0.000%)
nobackfill,norecover,noscrub,nodeep-scrub flag(s) set
1 osds down
3 nearfull osd(s)
Reduced data availability: 826 pgs inactive, 616 pgs down,
185 pgs peering, 158 pgs stale
Low space hindering backfill (add storage if this doesn't
resolve itself): 93 pgs backfill_toofull
Degraded data redundancy: 13285415/101904120 objects
degraded (13.037%), 706 pgs degraded, 696 pgs undersized
989 pgs not deep-scrubbed in time
378 pgs not scrubbed in time
10 pool(s) nearfull
2216 slow ops, oldest one blocked for 13905 sec, daemons
[osd.1,osd.11,osd.20,osd.24,osd.25,osd.29,osd.31,osd.37,osd.4,osd.5]...
have slow ops.
services:
mon: 1 daemons, quorum backup1 (age 8d)
mgr: backup1(active, since 8d)
osd: 37 osds: 26 up (since 9m), 27 in (since 2h); 626 remapped pgs
flags nobackfill,norecover,noscrub,nodeep-scrub
rgw: 1 daemon active (backup1.odiso.net)
task status:
data:
pools: 10 pools, 2785 pgs
objects: 50.95M objects, 92 TiB
usage: 121 TiB used, 39 TiB / 160 TiB avail
pgs: 29.659% pgs not active
13285415/101904120 objects degraded (13.037%)
433992/101904120 objects misplaced (0.426%)
4/50952060 objects unfound (0.000%)
840 active+clean+snaptrim_wait
536 down
490 active+undersized+degraded+remapped+backfilling
326 active+clean
113 peering
88 active+undersized+degraded
83 active+undersized+degraded+remapped+backfill_toofull
79 stale+down
63 stale+peering
51 active+clean+snaptrim
24 activating
22 active+recovering+degraded
19 active+remapped+backfilling
13 stale+active+undersized+degraded
9 remapped+peering
9 active+undersized+remapped+backfilling
9
active+undersized+degraded+remapped+backfill_wait+backfill_toofull
2 stale+active+clean+snaptrim
2 active+undersized
1 stale+active+clean+snaptrim_wait
1 active+remapped+backfill_toofull
1 active+clean+snaptrim_wait+laggy
1 active+recovering+undersized+remapped
1 down+remapped
1 activating+undersized+degraded+remapped
1 active+recovering+laggy
On 4/8/20 3:27 PM, Jack wrote:
The CPU is used by userspace, not kernelspace
Here is the perf top, see attachment
Rocksdb eats everything :/
On 4/8/20 3:14 PM, Paul Emmerich wrote:
What's the CPU busy with while spinning at 100%?
Check "perf top" for a quick overview
Paul
_______________________________________________
ceph-users mailing list -- mailto:ceph-users@xxxxxxx
To unsubscribe send an email to mailto:ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- mailto:ceph-users@xxxxxxx
To unsubscribe send an email to mailto:ceph-users-leave@xxxxxxx