Re: osd is immidietly down and uses CPU full.

Wido den Hollander <wido@xxxxxxxx> · Mon, 3 Feb 2020 08:22:58 +0100

On 2/2/20 5:20 PM, Andreas John wrote:
> Hello,
> 
> what you see is an stracktrace, so the OSD is hitting an unexpected
> state (Otherwise there would be an error handler).
> 
> The crash happens, when the osd wants to read from pipe when processing
> heartbeat. To me it sounds like a networking issue.
> 
> I see the other OSD an that host are healthy, to I would bet there is an
> issue with tcp port that this particular osd daemon used.
> 

It could also be that this OSD is so busy internally with other stuff
that it doesn't respond to heartbeats and then commits suicide.

Combined with the comment that VMs can't read their data it could very
well be that the OSD is super busy.

Maybe try a compact of the LevelDB database.

Wido

> Try 'netstat -tulpen' or so to check. If there a firewall between the
> host that might cut something off?
> 
> 
> rgds,
> 
> j.
> 
> 
> On 02.02.20 04:08, Makito Nishimiya wrote:
>> Hi.
>>
>> This is the cluster informastion.
>>
>>
>> -------------- /var/log/ceph/ceph.osd.1.log ---------------------------
>> 2020-02-01 03:47:20.635504 7f86f4e40700  1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
>> 2020-02-01 03:47:20.635521 7f86f4f41700  1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
>> 2020-02-01 03:47:20.747876 7f86f3b2d700  1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
>> 2020-02-01 03:47:20.747903 7f86f3c2e700  1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
>> 2020-02-01 03:47:21.152436 7f86e2a2e700  1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
>> 2020-02-01 03:47:21.152441 7f86e2a2e700  1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread 0x7f86fe35e700' had suicide timed out after 150
>> 2020-02-01 03:47:21.157963 7f86e2a2e700 -1 common/HeartbeatMap.cc: In
>> function 'bool ceph::HeartbeatMap::_check(const
>> ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f86e2a2e700
>> time 2020-02-01 03:47:21.152463
>> common/HeartbeatMap.cc: 86: FAILED assert(0 == "hit suicide timeout")
>>
>>  ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x85) [0x7f875259f425]
>>  2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char
>> const*, long)+0x2e1) [0x7f87524dede1]
>>  3: (ceph::HeartbeatMap::is_healthy()+0xde) [0x7f87524df63e]
>>  4: (OSD::handle_osd_ping(MOSDPing*)+0x93f) [0x7f8751f141df]
>>  5: (OSD::heartbeat_dispatch(Message*)+0x3cb) [0x7f8751f1540b]
>>  6: (DispatchQueue::fast_dispatch(Message*)+0x76) [0x7f875265f9d6]
>>  7: (Pipe::reader()+0x1dff) [0x7f875269c68f]
>>  8: (Pipe::Reader::entry()+0xd) [0x7f87526a41ad]
>>  9: (()+0x7e25) [0x7f87502b8e25]
>>  10: (clone()+0x6d) [0x7f874e94234d]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>> --- begin dump of recent events ---
>> -10000> 2020-02-01 03:45:29.491110 7f86f9253700  1 --
>> 10.1.201.201:0/3892596 --> 10.1.202.203:6805/5717 -- osd_ping(ping
>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877512a200 con
>> 0x7f8774eb6e80
>>  -9999> 2020-02-01 03:45:29.491109 7f86e8245700  1 --
>> 10.1.201.201:0/3892596 <== osd.9 10.1.201.202:6811/3006646 12 ====
>> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
>> 47+0+0 (4038678352 0 0) 0x7f8781383400 con 0x7f8774eaf480
>>  -9998> 2020-02-01 03:45:29.491123 7f86f9253700  1 --
>> 10.1.201.201:0/3892596 --> 10.1.201.203:6803/5220 -- osd_ping(ping
>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877512bc00 con
>> 0x7f8774eae100
>>  -9997> 2020-02-01 03:45:29.491137 7f86f9253700  1 --
>> 10.1.201.201:0/3892596 --> 10.1.202.203:6803/5220 -- osd_ping(ping
>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877512aa00 con
>> 0x7f8774eae280
>>  -9996> 2020-02-01 03:45:29.491150 7f86f9253700  1 --
>> 10.1.201.201:0/3892596 --> 10.1.201.203:6811/15624 -- osd_ping(ping
>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877512ac00 con
>> 0x7f8774eae700
>>  -9995> 2020-02-01 03:45:29.491148 7f86ec1a4700  1 --
>> 10.1.201.201:0/3892596 <== osd.8 10.1.202.202:6812/2131331 12 ====
>> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
>> 47+0+0 (4038678352 0 0) 0x7f8781380200 con 0x7f8774eb5200
>>  -9994> 2020-02-01 03:45:29.491163 7f86f9253700  1 --
>> 10.1.201.201:0/3892596 --> 10.1.202.203:6811/15624 -- osd_ping(ping
>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877512be00 con
>> 0x7f8774eae880
>>  -9993> 2020-02-01 03:45:29.491176 7f86f9253700  1 --
>> 10.1.201.201:0/3892596 --> 10.1.201.203:6801/4089 -- osd_ping(ping
>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8775128a00 con
>> 0x7f8774eaf780
>>  -9992> 2020-02-01 03:45:29.491169 7f86e8447700  1 --
>> 10.1.201.201:0/3892596 <== osd.9 10.1.202.202:6811/3006646 12 ====
>> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
>> 47+0+0 (4038678352 0 0) 0x7f8781380400 con 0x7f8774eaf600
>>  -9991> 2020-02-01 03:45:29.491167 7f86e975a700  1 --
>> 10.1.201.201:0/3892596 <== osd.10 10.1.201.202:6801/3005449 12 ====
>> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
>> 47+0+0 (4038678352 0 0) 0x7f8781380000 con 0x7f8774eb7c00
>>  -9990> 2020-02-01 03:45:29.491192 7f86f9253700  1 --
>> 10.1.201.201:0/3892596 --> 10.1.202.203:6801/4089 -- osd_ping(ping
>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8775128c00 con
>> 0x7f8774eaf900
>>  -9989> 2020-02-01 03:45:29.491206 7f86f9253700  1 --
>> 10.1.201.201:0/3892596 --> 10.1.201.203:6813/17285 -- osd_ping(ping
>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8775451e00 con
>> 0x7f8774ee3180
>>  -9988> 2020-02-01 03:45:29.491184 7f86e9255700  1 --
>> 10.1.201.201:0/3892596 <== osd.11 10.1.201.202:6809/2004102 12 ====
>> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
>> 47+0+0 (4038678352 0 0) 0x7f8781385800 con 0x7f8774eade00
>>  -9987> 2020-02-01 03:45:29.491202 7f86e9558700  1 --
>> 10.1.201.201:0/3892596 <== osd.10 10.1.202.202:6801/3005449 12 ====
>> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
>> 47+0+0 (4038678352 0 0) 0x7f8781385400 con 0x7f8774eb7d80
>>  -9986> 2020-02-01 03:45:29.491233 7f86f9253700  1 --
>> 10.1.201.201:0/3892596 --> 10.1.202.203:6813/17285 -- osd_ping(ping
>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877544fe00 con
>> 0x7f8774ee3300
>>  -9985> 2020-02-01 03:45:29.491238 7f86e9053700  1 --
>> 10.1.201.201:0/3892596 <== osd.11 10.1.202.202:6813/2004102 12 ====
>> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
>> 47+0+0 (4038678352 0 0) 0x7f8781383800 con 0x7f8774eadf80
>>  -9984> 2020-02-01 03:45:29.491234 7f86e874a700  1 --
>> 10.1.201.201:0/3892596 <== osd.12 10.1.201.202:6803/4953 12 ====
>> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
>> 47+0+0 (4038678352 0 0) 0x7f8781383a00 con 0x7f8774eae400
>>  -9983> 2020-02-01 03:45:29.491247 7f86f9253700  1 --
>> 10.1.201.201:0/3892596 --> 10.1.201.204:6807/13468 -- osd_ping(ping
>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8775576000 con
>> 0x7f8774eb5e00
>>  -9982> 2020-02-01 03:45:29.491258 7f86ecbae700  1 --
>> 10.1.201.201:0/3892596 <== osd.13 10.1.201.202:6805/2390794 12 ====
>> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2 ====
>> 47+0+0 (4038678352 0 0) 0x7f87813a6400 con 0x7f8774ee1980
>>  -9981> 2020-02-01 03:45:29.491266 7f86f9253700  1 --
>> 10.1.201.201:0/3892596 --> 10.1.202.204:6807/13468 -- osd_ping(ping
>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8775576e00 con
>> 0x7f8774eb5f80
>>  -9980> 2020-02-01 03:45:29.491291 7f86f9253700  1 --
>> 10.1.201.201:0/3892596 --> 10.1.201.204:6809/410784 -- osd_ping(ping
>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8774ffe400 con
>> 0x7f8774eac900
>>
>> ---------------- ceph -s
>> ------------------------------------------------------
>>
>>     cluster 57d2e172-0306-4f8a-946c-a4ae1e95b26b
>>      health HEALTH_WARN
>>             7 pgs backfill_wait
>>             11 pgs backfilling
>>             9 pgs degraded
>>             4 pgs recovery_wait
>>             21 pgs stuck unclean
>>             6 pgs undersized
>>             100 requests are blocked > 32 sec
>>             recovery 649692/75245727 objects degraded (0.863%)
>>             recovery 1027694/75245727 objects misplaced (1.366%)
>>             recovery 1/24998284 unfound (0.000%)
>>             pool default.rgw.buckets.data has many more objects per pg
>> than average (too few pgs?)
>>      monmap e1: 3 mons at
>> {cephmon01=10.1.202.199:6789/0,cephmon02=10.1.202.198:6789/0,cephmon03=10.1.202.197:6789/0}
>>             election epoch 562, quorum 0,1,2
>> cephmon03,cephmon02,cephmon01
>>      osdmap e127248: 42 osds: 37 up, 37 in; 20 remapped pgs
>>             flags sortbitwise,require_jewel_osds
>>       pgmap v92257695: 1568 pgs, 14 pools, 18271 GB data, 24412 kobjects
>>             55189 GB used, 45363 GB / 100553 GB avail
>>             649692/75245727 objects degraded (0.863%)
>>             1027694/75245727 objects misplaced (1.366%)
>>             1/24998284 unfound (0.000%)
>>                 1540 active+clean
>>                    8 active+remapped+backfilling
>>                    5 active+remapped+wait_backfill
>>                    3 active+clean+scrubbing
>>                    3 active+clean+scrubbing+deep
>>                    3 active+undersized+degraded+remapped+backfilling
>>                    2 active+undersized+degraded+remapped+wait_backfill
>>                    2 active+recovery_wait+degraded
>>                    1 active+recovery_wait+undersized+degraded+remapped
>>                    1 active+recovery_wait+degraded+remapped
>> recovery io 239 MB/s, 187 objects/s
>>   client io 575 kB/s wr, 0 op/s rd, 37 op/s wr
>>
>> ---------------- ceph osd tree
>> ------------------------------------------------------
>>
>> [root@ceph01 ceph]# ceph osd tree
>> ID WEIGHT    TYPE NAME       UP/DOWN REWEIGHT PRIMARY-AFFINITY
>> -1 108.19864 root default
>> -2  19.09381     host ceph01
>>  0   2.72769         osd.0        up  1.00000          1.00000
>>  1   2.72769         osd.1      down        0          1.00000 <-- now
>> down
>>  2   2.72769         osd.2        up  1.00000          1.00000
>>  5   2.72769         osd.5        up  1.00000          1.00000
>>  6   2.72768         osd.6        up  1.00000          1.00000
>>  3   2.72768         osd.3        up  1.00000          1.00000
>>  4   2.72769         osd.4        up  1.00000          1.00000
>> -3  19.09383     host ceph02
>>  8   2.72769         osd.8        up  1.00000          1.00000
>>  9   2.72769         osd.9        up  1.00000          1.00000
>> 10   2.72769         osd.10       up  1.00000          1.00000
>> 12   2.72769         osd.12       up  1.00000          1.00000
>> 11   2.72769         osd.11       up  1.00000          1.00000
>>  7   2.72768         osd.7        up  1.00000          1.00000
>> 13   2.72769         osd.13       up  1.00000          1.00000
>> -4  16.36626     host ceph03
>> 14   2.72769         osd.14       up  1.00000          1.00000
>> 16   2.72769         osd.16       up  1.00000          1.00000
>> 17   2.72769         osd.17       up  1.00000          1.00000
>> 19   2.72769         osd.19       up  1.00000          1.00000
>> 15   1.81850         osd.15       up  1.00000          1.00000
>> 18   1.81850         osd.18       up  1.00000          1.00000
>> 20   1.81850         osd.20       up  1.00000          1.00000
>> -5  15.45706     host ceph04
>> 23   2.72769         osd.23       up  1.00000          1.00000
>> 24   2.72769         osd.24       up  1.00000          1.00000
>> 27   2.72769         osd.27     down        0          1.00000 <--
>> more then 3month ago
>> 21   1.81850         osd.21       up  1.00000          1.00000
>> 22   1.81850         osd.22       up  1.00000          1.00000
>> 25   1.81850         osd.25       up  1.00000          1.00000
>> 26   1.81850         osd.26       up  1.00000          1.00000
>> -6  19.09384     host ceph05
>> 28   2.72769         osd.28       up  1.00000          1.00000
>> 29   2.72769         osd.29       up  1.00000          1.00000
>> 30   2.72769         osd.30       up  1.00000          1.00000
>> 31   2.72769         osd.31     down        0          1.00000 <--
>> more then 3month ago
>> 32   2.72769         osd.32       up  1.00000          1.00000
>> 34   2.72769         osd.34       up  1.00000          1.00000
>> 33   2.72769         osd.33     down        0          1.00000 <--
>> more then 3month ago
>> -7  19.09384     host ceph06
>> 35   2.72769         osd.35       up  1.00000          1.00000
>> 36   2.72769         osd.36       up  1.00000          1.00000
>> 37   2.72769         osd.37       up  1.00000          1.00000
>> 39   2.72769         osd.39       up  1.00000          1.00000
>> 40   2.72769         osd.40       up  1.00000          1.00000
>> 41   2.72769         osd.41       up  1.00000          1.00000
>> 38   2.72769         osd.38     down        0          1.00000 <--
>> more then 3month ago
>>
>>
>> ------------------------------
>>
>> On 2020/02/02 11:20, 西宮 牧人 wrote:
>>> Servers: 6 (include 7osds) total 42osdsl
>>> OS: Centos7
>>> Ceph: 10.2.5
>>>
>>> Hi, everyone
>>>
>>> The cluster is used for VM image storage and object storage.
>>> And I have a bucket which has more than 20 million objects.
>>>
>>> Now, I have a problem that cluster blocks operation.
>>>
>>> Suddenly cluster blocked operations, then VMs can't read disk.
>>> After a few hours, osd.1 was down.
>>>
>>> There is no disk fail messages in dmesg.
>>> And no error is in smartctl -a /dev/sde.
>>>
>>> I tried to wake up osd.1, but osd.1 is down soon.
>>> Just after re-waking up osd.1, VM can access to the disk.
>>> But osd.1 always uses 100% CPU, then cluster marked osd.1 down and
>>> the osd was dead by suicide timeout.
>>>
>>> I found that the osdmap epoch of osd.1 is different from other one.
>>> So I think osd.1 was dead.
>>>
>>>
>>> Question.
>>> (1) Why does the epoch of osd.1 differ from other osds ones ?
>>>
>>>   I checked all osds oldest_map and newest_map by ~ceph daemon osd.X
>>> status~
>>>   All osd's ecpoch are same number except osd.1
>>>
>>> (2) Why does osd.1 use CPU full?
>>>
>>>   After the cluster marked osd.1 down, osd.1 keeps up busy.
>>>   When I execute "ceph tell osd.1 injectargs --debug-ms 5/1", osd.1
>>> doesn't answer.
>>>
>>>
>>> Thank you.
>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx