Re: osd is immidietly down and uses CPU full.

Wido den Hollander <wido@xxxxxxxx> · Mon, 3 Feb 2020 08:45:04 +0100

On 2/3/20 8:39 AM, wes park wrote:
> How to know a OSD is super busy? Thanks.

Check if it's using 100% CPU for example. And check the disk util with
iostat.

Wido

> 
> Wido den Hollander <wido@xxxxxxxx <mailto:wido@xxxxxxxx>>
> 
> 
> 
>     On 2/2/20 5:20 PM, Andreas John wrote:
>     > Hello,
>     >
>     > what you see is an stracktrace, so the OSD is hitting an unexpected
>     > state (Otherwise there would be an error handler).
>     >
>     > The crash happens, when the osd wants to read from pipe when
>     processing
>     > heartbeat. To me it sounds like a networking issue.
>     >
>     > I see the other OSD an that host are healthy, to I would bet there
>     is an
>     > issue with tcp port that this particular osd daemon used.
>     >
> 
>     It could also be that this OSD is so busy internally with other stuff
>     that it doesn't respond to heartbeats and then commits suicide.
> 
>     Combined with the comment that VMs can't read their data it could very
>     well be that the OSD is super busy.
> 
>     Maybe try a compact of the LevelDB database.
> 
>     Wido
> 
>     > Try 'netstat -tulpen' or so to check. If there a firewall between the
>     > host that might cut something off?
>     >
>     >
>     > rgds,
>     >
>     > j.
>     >
>     >
>     > On 02.02.20 04:08, Makito Nishimiya wrote:
>     >> Hi.
>     >>
>     >> This is the cluster informastion.
>     >>
>     >>
>     >> -------------- /var/log/ceph/ceph.osd.1.log
>     ---------------------------
>     >> 2020-02-01 03:47:20.635504 7f86f4e40700  1 heartbeat_map is_healthy
>     >> 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
>     >> 2020-02-01 03:47:20.635521 7f86f4f41700  1 heartbeat_map is_healthy
>     >> 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
>     >> 2020-02-01 03:47:20.747876 7f86f3b2d700  1 heartbeat_map is_healthy
>     >> 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
>     >> 2020-02-01 03:47:20.747903 7f86f3c2e700  1 heartbeat_map is_healthy
>     >> 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
>     >> 2020-02-01 03:47:21.152436 7f86e2a2e700  1 heartbeat_map is_healthy
>     >> 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
>     >> 2020-02-01 03:47:21.152441 7f86e2a2e700  1 heartbeat_map is_healthy
>     >> 'OSD::osd_op_tp thread 0x7f86fe35e700' had suicide timed out
>     after 150
>     >> 2020-02-01 03:47:21.157963 7f86e2a2e700 -1 common/HeartbeatMap.cc: In
>     >> function 'bool ceph::HeartbeatMap::_check(const
>     >> ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f86e2a2e700
>     >> time 2020-02-01 03:47:21.152463
>     >> common/HeartbeatMap.cc: 86: FAILED assert(0 == "hit suicide timeout")
>     >>
>     >>  ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>     >>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>     >> const*)+0x85) [0x7f875259f425]
>     >>  2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char
>     >> const*, long)+0x2e1) [0x7f87524dede1]
>     >>  3: (ceph::HeartbeatMap::is_healthy()+0xde) [0x7f87524df63e]
>     >>  4: (OSD::handle_osd_ping(MOSDPing*)+0x93f) [0x7f8751f141df]
>     >>  5: (OSD::heartbeat_dispatch(Message*)+0x3cb) [0x7f8751f1540b]
>     >>  6: (DispatchQueue::fast_dispatch(Message*)+0x76) [0x7f875265f9d6]
>     >>  7: (Pipe::reader()+0x1dff) [0x7f875269c68f]
>     >>  8: (Pipe::Reader::entry()+0xd) [0x7f87526a41ad]
>     >>  9: (()+0x7e25) [0x7f87502b8e25]
>     >>  10: (clone()+0x6d) [0x7f874e94234d]
>     >>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>     >> needed to interpret this.
>     >>
>     >> --- begin dump of recent events ---
>     >> -10000> 2020-02-01 03:45:29.491110 7f86f9253700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> -->
>     10.1.202.203:6805/5717 <http://10.1.202.203:6805/5717> -- osd_ping(ping
>     >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0
>     0x7f877512a200 con
>     >> 0x7f8774eb6e80
>     >>  -9999> 2020-02-01 03:45:29.491109 7f86e8245700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> <== osd.9
>     10.1.201.202:6811/3006646 <http://10.1.201.202:6811/3006646> 12 ====
>     >> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
>     >> 47+0+0 (4038678352 0 0) 0x7f8781383400 con 0x7f8774eaf480
>     >>  -9998> 2020-02-01 03:45:29.491123 7f86f9253700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> -->
>     10.1.201.203:6803/5220 <http://10.1.201.203:6803/5220> -- osd_ping(ping
>     >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0
>     0x7f877512bc00 con
>     >> 0x7f8774eae100
>     >>  -9997> 2020-02-01 03:45:29.491137 7f86f9253700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> -->
>     10.1.202.203:6803/5220 <http://10.1.202.203:6803/5220> -- osd_ping(ping
>     >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0
>     0x7f877512aa00 con
>     >> 0x7f8774eae280
>     >>  -9996> 2020-02-01 03:45:29.491150 7f86f9253700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> -->
>     10.1.201.203:6811/15624 <http://10.1.201.203:6811/15624> --
>     osd_ping(ping
>     >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0
>     0x7f877512ac00 con
>     >> 0x7f8774eae700
>     >>  -9995> 2020-02-01 03:45:29.491148 7f86ec1a4700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> <== osd.8
>     10.1.202.202:6812/2131331 <http://10.1.202.202:6812/2131331> 12 ====
>     >> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
>     >> 47+0+0 (4038678352 0 0) 0x7f8781380200 con 0x7f8774eb5200
>     >>  -9994> 2020-02-01 03:45:29.491163 7f86f9253700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> -->
>     10.1.202.203:6811/15624 <http://10.1.202.203:6811/15624> --
>     osd_ping(ping
>     >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0
>     0x7f877512be00 con
>     >> 0x7f8774eae880
>     >>  -9993> 2020-02-01 03:45:29.491176 7f86f9253700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> -->
>     10.1.201.203:6801/4089 <http://10.1.201.203:6801/4089> -- osd_ping(ping
>     >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0
>     0x7f8775128a00 con
>     >> 0x7f8774eaf780
>     >>  -9992> 2020-02-01 03:45:29.491169 7f86e8447700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> <== osd.9
>     10.1.202.202:6811/3006646 <http://10.1.202.202:6811/3006646> 12 ====
>     >> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
>     >> 47+0+0 (4038678352 0 0) 0x7f8781380400 con 0x7f8774eaf600
>     >>  -9991> 2020-02-01 03:45:29.491167 7f86e975a700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> <== osd.10
>     10.1.201.202:6801/3005449 <http://10.1.201.202:6801/3005449> 12 ====
>     >> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
>     >> 47+0+0 (4038678352 0 0) 0x7f8781380000 con 0x7f8774eb7c00
>     >>  -9990> 2020-02-01 03:45:29.491192 7f86f9253700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> -->
>     10.1.202.203:6801/4089 <http://10.1.202.203:6801/4089> -- osd_ping(ping
>     >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0
>     0x7f8775128c00 con
>     >> 0x7f8774eaf900
>     >>  -9989> 2020-02-01 03:45:29.491206 7f86f9253700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> -->
>     10.1.201.203:6813/17285 <http://10.1.201.203:6813/17285> --
>     osd_ping(ping
>     >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0
>     0x7f8775451e00 con
>     >> 0x7f8774ee3180
>     >>  -9988> 2020-02-01 03:45:29.491184 7f86e9255700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> <== osd.11
>     10.1.201.202:6809/2004102 <http://10.1.201.202:6809/2004102> 12 ====
>     >> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
>     >> 47+0+0 (4038678352 0 0) 0x7f8781385800 con 0x7f8774eade00
>     >>  -9987> 2020-02-01 03:45:29.491202 7f86e9558700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> <== osd.10
>     10.1.202.202:6801/3005449 <http://10.1.202.202:6801/3005449> 12 ====
>     >> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
>     >> 47+0+0 (4038678352 0 0) 0x7f8781385400 con 0x7f8774eb7d80
>     >>  -9986> 2020-02-01 03:45:29.491233 7f86f9253700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> -->
>     10.1.202.203:6813/17285 <http://10.1.202.203:6813/17285> --
>     osd_ping(ping
>     >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0
>     0x7f877544fe00 con
>     >> 0x7f8774ee3300
>     >>  -9985> 2020-02-01 03:45:29.491238 7f86e9053700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> <== osd.11
>     10.1.202.202:6813/2004102 <http://10.1.202.202:6813/2004102> 12 ====
>     >> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
>     >> 47+0+0 (4038678352 0 0) 0x7f8781383800 con 0x7f8774eadf80
>     >>  -9984> 2020-02-01 03:45:29.491234 7f86e874a700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> <== osd.12
>     10.1.201.202:6803/4953 <http://10.1.201.202:6803/4953> 12 ====
>     >> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
>     >> 47+0+0 (4038678352 0 0) 0x7f8781383a00 con 0x7f8774eae400
>     >>  -9983> 2020-02-01 03:45:29.491247 7f86f9253700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> -->
>     10.1.201.204:6807/13468 <http://10.1.201.204:6807/13468> --
>     osd_ping(ping
>     >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0
>     0x7f8775576000 con
>     >> 0x7f8774eb5e00
>     >>  -9982> 2020-02-01 03:45:29.491258 7f86ecbae700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> <== osd.13
>     10.1.201.202:6805/2390794 <http://10.1.201.202:6805/2390794> 12 ====
>     >> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
>     >> 47+0+0 (4038678352 0 0) 0x7f87813a6400 con 0x7f8774ee1980
>     >>  -9981> 2020-02-01 03:45:29.491266 7f86f9253700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> -->
>     10.1.202.204:6807/13468 <http://10.1.202.204:6807/13468> --
>     osd_ping(ping
>     >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0
>     0x7f8775576e00 con
>     >> 0x7f8774eb5f80
>     >>  -9980> 2020-02-01 03:45:29.491291 7f86f9253700  1 --
>     >> 10.1.201.201:0/3892596 <http://10.1.201.201:0/3892596> -->
>     10.1.201.204:6809/410784 <http://10.1.201.204:6809/410784> --
>     osd_ping(ping
>     >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0
>     0x7f8774ffe400 con
>     >> 0x7f8774eac900
>     >>
>     >> ---------------- ceph -s
>     >> ------------------------------------------------------
>     >>
>     >>     cluster 57d2e172-0306-4f8a-946c-a4ae1e95b26b
>     >>      health HEALTH_WARN
>     >>             7 pgs backfill_wait
>     >>             11 pgs backfilling
>     >>             9 pgs degraded
>     >>             4 pgs recovery_wait
>     >>             21 pgs stuck unclean
>     >>             6 pgs undersized
>     >>             100 requests are blocked > 32 sec
>     >>             recovery 649692/75245727 objects degraded (0.863%)
>     >>             recovery 1027694/75245727 objects misplaced (1.366%)
>     >>             recovery 1/24998284 unfound (0.000%)
>     >>             pool default.rgw.buckets.data has many more objects
>     per pg
>     >> than average (too few pgs?)
>     >>      monmap e1: 3 mons at
>     >>
>     {cephmon01=10.1.202.199:6789/0,cephmon02=10.1.202.198:6789/0,cephmon03=10.1.202.197:6789/0
>     <http://10.1.202.199:6789/0,cephmon02=10.1.202.198:6789/0,cephmon03=10.1.202.197:6789/0>}
>     >>             election epoch 562, quorum 0,1,2
>     >> cephmon03,cephmon02,cephmon01
>     >>      osdmap e127248: 42 osds: 37 up, 37 in; 20 remapped pgs
>     >>             flags sortbitwise,require_jewel_osds
>     >>       pgmap v92257695: 1568 pgs, 14 pools, 18271 GB data, 24412
>     kobjects
>     >>             55189 GB used, 45363 GB / 100553 GB avail
>     >>             649692/75245727 objects degraded (0.863%)
>     >>             1027694/75245727 objects misplaced (1.366%)
>     >>             1/24998284 unfound (0.000%)
>     >>                 1540 active+clean
>     >>                    8 active+remapped+backfilling
>     >>                    5 active+remapped+wait_backfill
>     >>                    3 active+clean+scrubbing
>     >>                    3 active+clean+scrubbing+deep
>     >>                    3 active+undersized+degraded+remapped+backfilling
>     >>                    2
>     active+undersized+degraded+remapped+wait_backfill
>     >>                    2 active+recovery_wait+degraded
>     >>                    1
>     active+recovery_wait+undersized+degraded+remapped
>     >>                    1 active+recovery_wait+degraded+remapped
>     >> recovery io 239 MB/s, 187 objects/s
>     >>   client io 575 kB/s wr, 0 op/s rd, 37 op/s wr
>     >>
>     >> ---------------- ceph osd tree
>     >> ------------------------------------------------------
>     >>
>     >> [root@ceph01 ceph]# ceph osd tree
>     >> ID WEIGHT    TYPE NAME       UP/DOWN REWEIGHT PRIMARY-AFFINITY
>     >> -1 108.19864 root default
>     >> -2  19.09381     host ceph01
>     >>  0   2.72769         osd.0        up  1.00000          1.00000
>     >>  1   2.72769         osd.1      down        0          1.00000
>     <-- now
>     >> down
>     >>  2   2.72769         osd.2        up  1.00000          1.00000
>     >>  5   2.72769         osd.5        up  1.00000          1.00000
>     >>  6   2.72768         osd.6        up  1.00000          1.00000
>     >>  3   2.72768         osd.3        up  1.00000          1.00000
>     >>  4   2.72769         osd.4        up  1.00000          1.00000
>     >> -3  19.09383     host ceph02
>     >>  8   2.72769         osd.8        up  1.00000          1.00000
>     >>  9   2.72769         osd.9        up  1.00000          1.00000
>     >> 10   2.72769         osd.10       up  1.00000          1.00000
>     >> 12   2.72769         osd.12       up  1.00000          1.00000
>     >> 11   2.72769         osd.11       up  1.00000          1.00000
>     >>  7   2.72768         osd.7        up  1.00000          1.00000
>     >> 13   2.72769         osd.13       up  1.00000          1.00000
>     >> -4  16.36626     host ceph03
>     >> 14   2.72769         osd.14       up  1.00000          1.00000
>     >> 16   2.72769         osd.16       up  1.00000          1.00000
>     >> 17   2.72769         osd.17       up  1.00000          1.00000
>     >> 19   2.72769         osd.19       up  1.00000          1.00000
>     >> 15   1.81850         osd.15       up  1.00000          1.00000
>     >> 18   1.81850         osd.18       up  1.00000          1.00000
>     >> 20   1.81850         osd.20       up  1.00000          1.00000
>     >> -5  15.45706     host ceph04
>     >> 23   2.72769         osd.23       up  1.00000          1.00000
>     >> 24   2.72769         osd.24       up  1.00000          1.00000
>     >> 27   2.72769         osd.27     down        0          1.00000 <--
>     >> more then 3month ago
>     >> 21   1.81850         osd.21       up  1.00000          1.00000
>     >> 22   1.81850         osd.22       up  1.00000          1.00000
>     >> 25   1.81850         osd.25       up  1.00000          1.00000
>     >> 26   1.81850         osd.26       up  1.00000          1.00000
>     >> -6  19.09384     host ceph05
>     >> 28   2.72769         osd.28       up  1.00000          1.00000
>     >> 29   2.72769         osd.29       up  1.00000          1.00000
>     >> 30   2.72769         osd.30       up  1.00000          1.00000
>     >> 31   2.72769         osd.31     down        0          1.00000 <--
>     >> more then 3month ago
>     >> 32   2.72769         osd.32       up  1.00000          1.00000
>     >> 34   2.72769         osd.34       up  1.00000          1.00000
>     >> 33   2.72769         osd.33     down        0          1.00000 <--
>     >> more then 3month ago
>     >> -7  19.09384     host ceph06
>     >> 35   2.72769         osd.35       up  1.00000          1.00000
>     >> 36   2.72769         osd.36       up  1.00000          1.00000
>     >> 37   2.72769         osd.37       up  1.00000          1.00000
>     >> 39   2.72769         osd.39       up  1.00000          1.00000
>     >> 40   2.72769         osd.40       up  1.00000          1.00000
>     >> 41   2.72769         osd.41       up  1.00000          1.00000
>     >> 38   2.72769         osd.38     down        0          1.00000 <--
>     >> more then 3month ago
>     >>
>     >>
>     >> ------------------------------
>     >>
>     >> On 2020/02/02 11:20, 西宮 牧人 wrote:
>     >>> Servers: 6 (include 7osds) total 42osdsl
>     >>> OS: Centos7
>     >>> Ceph: 10.2.5
>     >>>
>     >>> Hi, everyone
>     >>>
>     >>> The cluster is used for VM image storage and object storage.
>     >>> And I have a bucket which has more than 20 million objects.
>     >>>
>     >>> Now, I have a problem that cluster blocks operation.
>     >>>
>     >>> Suddenly cluster blocked operations, then VMs can't read disk.
>     >>> After a few hours, osd.1 was down.
>     >>>
>     >>> There is no disk fail messages in dmesg.
>     >>> And no error is in smartctl -a /dev/sde.
>     >>>
>     >>> I tried to wake up osd.1, but osd.1 is down soon.
>     >>> Just after re-waking up osd.1, VM can access to the disk.
>     >>> But osd.1 always uses 100% CPU, then cluster marked osd.1 down and
>     >>> the osd was dead by suicide timeout.
>     >>>
>     >>> I found that the osdmap epoch of osd.1 is different from other one.
>     >>> So I think osd.1 was dead.
>     >>>
>     >>>
>     >>> Question.
>     >>> (1) Why does the epoch of osd.1 differ from other osds ones ?
>     >>>
>     >>>   I checked all osds oldest_map and newest_map by ~ceph daemon osd.X
>     >>> status~
>     >>>   All osd's ecpoch are same number except osd.1
>     >>>
>     >>> (2) Why does osd.1 use CPU full?
>     >>>
>     >>>   After the cluster marked osd.1 down, osd.1 keeps up busy.
>     >>>   When I execute "ceph tell osd.1 injectargs --debug-ms 5/1", osd.1
>     >>> doesn't answer.
>     >>>
>     >>>
>     >>> Thank you.
>     >>
>     _______________________________________________
>     ceph-users mailing list -- ceph-users@xxxxxxx
>     <mailto:ceph-users@xxxxxxx>
>     To unsubscribe send an email to ceph-users-leave@xxxxxxx
>     <mailto:ceph-users-leave@xxxxxxx>
> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx