Re: osd is immidietly down and uses CPU full.

wes park <wespark@xxxxxxxxxxxxxx> · Mon, 3 Feb 2020 15:39:39 +0800

How to know a OSD is super busy? Thanks.

Wido den Hollander <wido@xxxxxxxx>

>
>
> On 2/2/20 5:20 PM, Andreas John wrote:
> > Hello,
> >
> > what you see is an stracktrace, so the OSD is hitting an unexpected
> > state (Otherwise there would be an error handler).
> >
> > The crash happens, when the osd wants to read from pipe when processing
> > heartbeat. To me it sounds like a networking issue.
> >
> > I see the other OSD an that host are healthy, to I would bet there is an
> > issue with tcp port that this particular osd daemon used.
> >
>
> It could also be that this OSD is so busy internally with other stuff
> that it doesn't respond to heartbeats and then commits suicide.
>
> Combined with the comment that VMs can't read their data it could very
> well be that the OSD is super busy.
>
> Maybe try a compact of the LevelDB database.
>
> Wido
>
> > Try 'netstat -tulpen' or so to check. If there a firewall between the
> > host that might cut something off?
> >
> >
> > rgds,
> >
> > j.
> >
> >
> > On 02.02.20 04:08, Makito Nishimiya wrote:
> >> Hi.
> >>
> >> This is the cluster informastion.
> >>
> >>
> >> -------------- /var/log/ceph/ceph.osd.1.log ---------------------------
> >> 2020-02-01 03:47:20.635504 7f86f4e40700  1 heartbeat_map is_healthy
> >> 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
> >> 2020-02-01 03:47:20.635521 7f86f4f41700  1 heartbeat_map is_healthy
> >> 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
> >> 2020-02-01 03:47:20.747876 7f86f3b2d700  1 heartbeat_map is_healthy
> >> 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
> >> 2020-02-01 03:47:20.747903 7f86f3c2e700  1 heartbeat_map is_healthy
> >> 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
> >> 2020-02-01 03:47:21.152436 7f86e2a2e700  1 heartbeat_map is_healthy
> >> 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
> >> 2020-02-01 03:47:21.152441 7f86e2a2e700  1 heartbeat_map is_healthy
> >> 'OSD::osd_op_tp thread 0x7f86fe35e700' had suicide timed out after 150
> >> 2020-02-01 03:47:21.157963 7f86e2a2e700 -1 common/HeartbeatMap.cc: In
> >> function 'bool ceph::HeartbeatMap::_check(const
> >> ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f86e2a2e700
> >> time 2020-02-01 03:47:21.152463
> >> common/HeartbeatMap.cc: 86: FAILED assert(0 == "hit suicide timeout")
> >>
> >>  ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
> >>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >> const*)+0x85) [0x7f875259f425]
> >>  2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char
> >> const*, long)+0x2e1) [0x7f87524dede1]
> >>  3: (ceph::HeartbeatMap::is_healthy()+0xde) [0x7f87524df63e]
> >>  4: (OSD::handle_osd_ping(MOSDPing*)+0x93f) [0x7f8751f141df]
> >>  5: (OSD::heartbeat_dispatch(Message*)+0x3cb) [0x7f8751f1540b]
> >>  6: (DispatchQueue::fast_dispatch(Message*)+0x76) [0x7f875265f9d6]
> >>  7: (Pipe::reader()+0x1dff) [0x7f875269c68f]
> >>  8: (Pipe::Reader::entry()+0xd) [0x7f87526a41ad]
> >>  9: (()+0x7e25) [0x7f87502b8e25]
> >>  10: (clone()+0x6d) [0x7f874e94234d]
> >>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >> needed to interpret this.
> >>
> >> --- begin dump of recent events ---
> >> -10000> 2020-02-01 03:45:29.491110 7f86f9253700  1 --
> >> 10.1.201.201:0/3892596 --> 10.1.202.203:6805/5717 -- osd_ping(ping
> >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877512a200 con
> >> 0x7f8774eb6e80
> >>  -9999> 2020-02-01 03:45:29.491109 7f86e8245700  1 --
> >> 10.1.201.201:0/3892596 <== osd.9 10.1.201.202:6811/3006646 12 ====
> >> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
> >> 47+0+0 (4038678352 0 0) 0x7f8781383400 con 0x7f8774eaf480
> >>  -9998> 2020-02-01 03:45:29.491123 7f86f9253700  1 --
> >> 10.1.201.201:0/3892596 --> 10.1.201.203:6803/5220 -- osd_ping(ping
> >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877512bc00 con
> >> 0x7f8774eae100
> >>  -9997> 2020-02-01 03:45:29.491137 7f86f9253700  1 --
> >> 10.1.201.201:0/3892596 --> 10.1.202.203:6803/5220 -- osd_ping(ping
> >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877512aa00 con
> >> 0x7f8774eae280
> >>  -9996> 2020-02-01 03:45:29.491150 7f86f9253700  1 --
> >> 10.1.201.201:0/3892596 --> 10.1.201.203:6811/15624 -- osd_ping(ping
> >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877512ac00 con
> >> 0x7f8774eae700
> >>  -9995> 2020-02-01 03:45:29.491148 7f86ec1a4700  1 --
> >> 10.1.201.201:0/3892596 <== osd.8 10.1.202.202:6812/2131331 12 ====
> >> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
> >> 47+0+0 (4038678352 0 0) 0x7f8781380200 con 0x7f8774eb5200
> >>  -9994> 2020-02-01 03:45:29.491163 7f86f9253700  1 --
> >> 10.1.201.201:0/3892596 --> 10.1.202.203:6811/15624 -- osd_ping(ping
> >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877512be00 con
> >> 0x7f8774eae880
> >>  -9993> 2020-02-01 03:45:29.491176 7f86f9253700  1 --
> >> 10.1.201.201:0/3892596 --> 10.1.201.203:6801/4089 -- osd_ping(ping
> >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8775128a00 con
> >> 0x7f8774eaf780
> >>  -9992> 2020-02-01 03:45:29.491169 7f86e8447700  1 --
> >> 10.1.201.201:0/3892596 <== osd.9 10.1.202.202:6811/3006646 12 ====
> >> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
> >> 47+0+0 (4038678352 0 0) 0x7f8781380400 con 0x7f8774eaf600
> >>  -9991> 2020-02-01 03:45:29.491167 7f86e975a700  1 --
> >> 10.1.201.201:0/3892596 <== osd.10 10.1.201.202:6801/3005449 12 ====
> >> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
> >> 47+0+0 (4038678352 0 0) 0x7f8781380000 con 0x7f8774eb7c00
> >>  -9990> 2020-02-01 03:45:29.491192 7f86f9253700  1 --
> >> 10.1.201.201:0/3892596 --> 10.1.202.203:6801/4089 -- osd_ping(ping
> >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8775128c00 con
> >> 0x7f8774eaf900
> >>  -9989> 2020-02-01 03:45:29.491206 7f86f9253700  1 --
> >> 10.1.201.201:0/3892596 --> 10.1.201.203:6813/17285 -- osd_ping(ping
> >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8775451e00 con
> >> 0x7f8774ee3180
> >>  -9988> 2020-02-01 03:45:29.491184 7f86e9255700  1 --
> >> 10.1.201.201:0/3892596 <== osd.11 10.1.201.202:6809/2004102 12 ====
> >> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
> >> 47+0+0 (4038678352 0 0) 0x7f8781385800 con 0x7f8774eade00
> >>  -9987> 2020-02-01 03:45:29.491202 7f86e9558700  1 --
> >> 10.1.201.201:0/3892596 <== osd.10 10.1.202.202:6801/3005449 12 ====
> >> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
> >> 47+0+0 (4038678352 0 0) 0x7f8781385400 con 0x7f8774eb7d80
> >>  -9986> 2020-02-01 03:45:29.491233 7f86f9253700  1 --
> >> 10.1.201.201:0/3892596 --> 10.1.202.203:6813/17285 -- osd_ping(ping
> >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877544fe00 con
> >> 0x7f8774ee3300
> >>  -9985> 2020-02-01 03:45:29.491238 7f86e9053700  1 --
> >> 10.1.201.201:0/3892596 <== osd.11 10.1.202.202:6813/2004102 12 ====
> >> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
> >> 47+0+0 (4038678352 0 0) 0x7f8781383800 con 0x7f8774eadf80
> >>  -9984> 2020-02-01 03:45:29.491234 7f86e874a700  1 --
> >> 10.1.201.201:0/3892596 <== osd.12 10.1.201.202:6803/4953 12 ====
> >> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
> >> 47+0+0 (4038678352 0 0) 0x7f8781383a00 con 0x7f8774eae400
> >>  -9983> 2020-02-01 03:45:29.491247 7f86f9253700  1 --
> >> 10.1.201.201:0/3892596 --> 10.1.201.204:6807/13468 -- osd_ping(ping
> >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8775576000 con
> >> 0x7f8774eb5e00
> >>  -9982> 2020-02-01 03:45:29.491258 7f86ecbae700  1 --
> >> 10.1.201.201:0/3892596 <== osd.13 10.1.201.202:6805/2390794 12 ====
> >> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
> >> 47+0+0 (4038678352 0 0) 0x7f87813a6400 con 0x7f8774ee1980
> >>  -9981> 2020-02-01 03:45:29.491266 7f86f9253700  1 --
> >> 10.1.201.201:0/3892596 --> 10.1.202.204:6807/13468 -- osd_ping(ping
> >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8775576e00 con
> >> 0x7f8774eb5f80
> >>  -9980> 2020-02-01 03:45:29.491291 7f86f9253700  1 --
> >> 10.1.201.201:0/3892596 --> 10.1.201.204:6809/410784 -- osd_ping(ping
> >> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8774ffe400 con
> >> 0x7f8774eac900
> >>
> >> ---------------- ceph -s
> >> ------------------------------------------------------
> >>
> >>     cluster 57d2e172-0306-4f8a-946c-a4ae1e95b26b
> >>      health HEALTH_WARN
> >>             7 pgs backfill_wait
> >>             11 pgs backfilling
> >>             9 pgs degraded
> >>             4 pgs recovery_wait
> >>             21 pgs stuck unclean
> >>             6 pgs undersized
> >>             100 requests are blocked > 32 sec
> >>             recovery 649692/75245727 objects degraded (0.863%)
> >>             recovery 1027694/75245727 objects misplaced (1.366%)
> >>             recovery 1/24998284 unfound (0.000%)
> >>             pool default.rgw.buckets.data has many more objects per pg
> >> than average (too few pgs?)
> >>      monmap e1: 3 mons at
> >> {cephmon01=
> 10.1.202.199:6789/0,cephmon02=10.1.202.198:6789/0,cephmon03=10.1.202.197:6789/0
> }
> >>             election epoch 562, quorum 0,1,2
> >> cephmon03,cephmon02,cephmon01
> >>      osdmap e127248: 42 osds: 37 up, 37 in; 20 remapped pgs
> >>             flags sortbitwise,require_jewel_osds
> >>       pgmap v92257695: 1568 pgs, 14 pools, 18271 GB data, 24412 kobjects
> >>             55189 GB used, 45363 GB / 100553 GB avail
> >>             649692/75245727 objects degraded (0.863%)
> >>             1027694/75245727 objects misplaced (1.366%)
> >>             1/24998284 unfound (0.000%)
> >>                 1540 active+clean
> >>                    8 active+remapped+backfilling
> >>                    5 active+remapped+wait_backfill
> >>                    3 active+clean+scrubbing
> >>                    3 active+clean+scrubbing+deep
> >>                    3 active+undersized+degraded+remapped+backfilling
> >>                    2 active+undersized+degraded+remapped+wait_backfill
> >>                    2 active+recovery_wait+degraded
> >>                    1 active+recovery_wait+undersized+degraded+remapped
> >>                    1 active+recovery_wait+degraded+remapped
> >> recovery io 239 MB/s, 187 objects/s
> >>   client io 575 kB/s wr, 0 op/s rd, 37 op/s wr
> >>
> >> ---------------- ceph osd tree
> >> ------------------------------------------------------
> >>
> >> [root@ceph01 ceph]# ceph osd tree
> >> ID WEIGHT    TYPE NAME       UP/DOWN REWEIGHT PRIMARY-AFFINITY
> >> -1 108.19864 root default
> >> -2  19.09381     host ceph01
> >>  0   2.72769         osd.0        up  1.00000          1.00000
> >>  1   2.72769         osd.1      down        0          1.00000 <-- now
> >> down
> >>  2   2.72769         osd.2        up  1.00000          1.00000
> >>  5   2.72769         osd.5        up  1.00000          1.00000
> >>  6   2.72768         osd.6        up  1.00000          1.00000
> >>  3   2.72768         osd.3        up  1.00000          1.00000
> >>  4   2.72769         osd.4        up  1.00000          1.00000
> >> -3  19.09383     host ceph02
> >>  8   2.72769         osd.8        up  1.00000          1.00000
> >>  9   2.72769         osd.9        up  1.00000          1.00000
> >> 10   2.72769         osd.10       up  1.00000          1.00000
> >> 12   2.72769         osd.12       up  1.00000          1.00000
> >> 11   2.72769         osd.11       up  1.00000          1.00000
> >>  7   2.72768         osd.7        up  1.00000          1.00000
> >> 13   2.72769         osd.13       up  1.00000          1.00000
> >> -4  16.36626     host ceph03
> >> 14   2.72769         osd.14       up  1.00000          1.00000
> >> 16   2.72769         osd.16       up  1.00000          1.00000
> >> 17   2.72769         osd.17       up  1.00000          1.00000
> >> 19   2.72769         osd.19       up  1.00000          1.00000
> >> 15   1.81850         osd.15       up  1.00000          1.00000
> >> 18   1.81850         osd.18       up  1.00000          1.00000
> >> 20   1.81850         osd.20       up  1.00000          1.00000
> >> -5  15.45706     host ceph04
> >> 23   2.72769         osd.23       up  1.00000          1.00000
> >> 24   2.72769         osd.24       up  1.00000          1.00000
> >> 27   2.72769         osd.27     down        0          1.00000 <--
> >> more then 3month ago
> >> 21   1.81850         osd.21       up  1.00000          1.00000
> >> 22   1.81850         osd.22       up  1.00000          1.00000
> >> 25   1.81850         osd.25       up  1.00000          1.00000
> >> 26   1.81850         osd.26       up  1.00000          1.00000
> >> -6  19.09384     host ceph05
> >> 28   2.72769         osd.28       up  1.00000          1.00000
> >> 29   2.72769         osd.29       up  1.00000          1.00000
> >> 30   2.72769         osd.30       up  1.00000          1.00000
> >> 31   2.72769         osd.31     down        0          1.00000 <--
> >> more then 3month ago
> >> 32   2.72769         osd.32       up  1.00000          1.00000
> >> 34   2.72769         osd.34       up  1.00000          1.00000
> >> 33   2.72769         osd.33     down        0          1.00000 <--
> >> more then 3month ago
> >> -7  19.09384     host ceph06
> >> 35   2.72769         osd.35       up  1.00000          1.00000
> >> 36   2.72769         osd.36       up  1.00000          1.00000
> >> 37   2.72769         osd.37       up  1.00000          1.00000
> >> 39   2.72769         osd.39       up  1.00000          1.00000
> >> 40   2.72769         osd.40       up  1.00000          1.00000
> >> 41   2.72769         osd.41       up  1.00000          1.00000
> >> 38   2.72769         osd.38     down        0          1.00000 <--
> >> more then 3month ago
> >>
> >>
> >> ------------------------------
> >>
> >> On 2020/02/02 11:20, 西宮 牧人 wrote:
> >>> Servers: 6 (include 7osds) total 42osdsl
> >>> OS: Centos7
> >>> Ceph: 10.2.5
> >>>
> >>> Hi, everyone
> >>>
> >>> The cluster is used for VM image storage and object storage.
> >>> And I have a bucket which has more than 20 million objects.
> >>>
> >>> Now, I have a problem that cluster blocks operation.
> >>>
> >>> Suddenly cluster blocked operations, then VMs can't read disk.
> >>> After a few hours, osd.1 was down.
> >>>
> >>> There is no disk fail messages in dmesg.
> >>> And no error is in smartctl -a /dev/sde.
> >>>
> >>> I tried to wake up osd.1, but osd.1 is down soon.
> >>> Just after re-waking up osd.1, VM can access to the disk.
> >>> But osd.1 always uses 100% CPU, then cluster marked osd.1 down and
> >>> the osd was dead by suicide timeout.
> >>>
> >>> I found that the osdmap epoch of osd.1 is different from other one.
> >>> So I think osd.1 was dead.
> >>>
> >>>
> >>> Question.
> >>> (1) Why does the epoch of osd.1 differ from other osds ones ?
> >>>
> >>>   I checked all osds oldest_map and newest_map by ~ceph daemon osd.X
> >>> status~
> >>>   All osd's ecpoch are same number except osd.1
> >>>
> >>> (2) Why does osd.1 use CPU full?
> >>>
> >>>   After the cluster marked osd.1 down, osd.1 keeps up busy.
> >>>   When I execute "ceph tell osd.1 injectargs --debug-ms 5/1", osd.1
> >>> doesn't answer.
> >>>
> >>>
> >>> Thank you.
> >>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx