Re: osd is immidietly down and uses CPU full.

西宮牧人 <nishimiya@xxxxxxxxxxx> · Thu, 6 Feb 2020 11:42:15 +0900

Hi, everyone.

The problem was solved.
A PG's epoch of active osds was dfferent from one of acting osds.
When I removed head and TEMP directory of the PG from active osds, blocked
request vanished.

Thanks to your kindness.

2020年2月6日(木) 4:19 <ceph@xxxxxxxxxx>:

> What do you guys Think about
>
> Ceph osd Set noout/Down
>
> And See if the osd will become healthy?
>
> Another idea which is in my mind is to remove the sayed osd from the
> Cluster...
>
> As long as the other osds on the same node dont have an issue i guess the
> disk has a Problem...
>
> Just my 2 Cents
> - Mehmet
>
> Am 3. Februar 2020 08:22:58 MEZ schrieb Wido den Hollander <wido@xxxxxxxx
> >:
> >
> >
> >On 2/2/20 5:20 PM, Andreas John wrote:
> >> Hello,
> >>
> >> what you see is an stracktrace, so the OSD is hitting an unexpected
> >> state (Otherwise there would be an error handler).
> >>
> >> The crash happens, when the osd wants to read from pipe when
> >processing
> >> heartbeat. To me it sounds like a networking issue.
> >>
> >> I see the other OSD an that host are healthy, to I would bet there is
> >an
> >> issue with tcp port that this particular osd daemon used.
> >>
> >
> >It could also be that this OSD is so busy internally with other stuff
> >that it doesn't respond to heartbeats and then commits suicide.
> >
> >Combined with the comment that VMs can't read their data it could very
> >well be that the OSD is super busy.
> >
> >Maybe try a compact of the LevelDB database.
> >
> >Wido
> >
> >> Try 'netstat -tulpen' or so to check. If there a firewall between the
> >> host that might cut something off?
> >>
> >>
> >> rgds,
> >>
> >> j.
> >>
> >>
> >> On 02.02.20 04:08, Makito Nishimiya wrote:
> >>> Hi.
> >>>
> >>> This is the cluster informastion.
> >>>
> >>>
> >>> -------------- /var/log/ceph/ceph.osd.1.log
> >---------------------------
> >>> 2020-02-01 03:47:20.635504 7f86f4e40700  1 heartbeat_map is_healthy
> >>> 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
> >>> 2020-02-01 03:47:20.635521 7f86f4f41700  1 heartbeat_map is_healthy
> >>> 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
> >>> 2020-02-01 03:47:20.747876 7f86f3b2d700  1 heartbeat_map is_healthy
> >>> 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
> >>> 2020-02-01 03:47:20.747903 7f86f3c2e700  1 heartbeat_map is_healthy
> >>> 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
> >>> 2020-02-01 03:47:21.152436 7f86e2a2e700  1 heartbeat_map is_healthy
> >>> 'OSD::osd_op_tp thread 0x7f86fe35e700' had timed out after 15
> >>> 2020-02-01 03:47:21.152441 7f86e2a2e700  1 heartbeat_map is_healthy
> >>> 'OSD::osd_op_tp thread 0x7f86fe35e700' had suicide timed out after
> >150
> >>> 2020-02-01 03:47:21.157963 7f86e2a2e700 -1 common/HeartbeatMap.cc:
> >In
> >>> function 'bool ceph::HeartbeatMap::_check(const
> >>> ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f86e2a2e700
> >>> time 2020-02-01 03:47:21.152463
> >>> common/HeartbeatMap.cc: 86: FAILED assert(0 == "hit suicide
> >timeout")
> >>>
> >>>  ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
> >>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>> const*)+0x85) [0x7f875259f425]
> >>>  2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*,
> >char
> >>> const*, long)+0x2e1) [0x7f87524dede1]
> >>>  3: (ceph::HeartbeatMap::is_healthy()+0xde) [0x7f87524df63e]
> >>>  4: (OSD::handle_osd_ping(MOSDPing*)+0x93f) [0x7f8751f141df]
> >>>  5: (OSD::heartbeat_dispatch(Message*)+0x3cb) [0x7f8751f1540b]
> >>>  6: (DispatchQueue::fast_dispatch(Message*)+0x76) [0x7f875265f9d6]
> >>>  7: (Pipe::reader()+0x1dff) [0x7f875269c68f]
> >>>  8: (Pipe::Reader::entry()+0xd) [0x7f87526a41ad]
> >>>  9: (()+0x7e25) [0x7f87502b8e25]
> >>>  10: (clone()+0x6d) [0x7f874e94234d]
> >>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>> needed to interpret this.
> >>>
> >>> --- begin dump of recent events ---
> >>> -10000> 2020-02-01 03:45:29.491110 7f86f9253700  1 --
> >>> 10.1.201.201:0/3892596 --> 10.1.202.203:6805/5717 -- osd_ping(ping
> >>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877512a200
> >con
> >>> 0x7f8774eb6e80
> >>>  -9999> 2020-02-01 03:45:29.491109 7f86e8245700  1 --
> >>> 10.1.201.201:0/3892596 <== osd.9 10.1.201.202:6811/3006646 12 ====
> >>> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2
> >====
> >>> 47+0+0 (4038678352 0 0) 0x7f8781383400 con 0x7f8774eaf480
> >>>  -9998> 2020-02-01 03:45:29.491123 7f86f9253700  1 --
> >>> 10.1.201.201:0/3892596 --> 10.1.201.203:6803/5220 -- osd_ping(ping
> >>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877512bc00
> >con
> >>> 0x7f8774eae100
> >>>  -9997> 2020-02-01 03:45:29.491137 7f86f9253700  1 --
> >>> 10.1.201.201:0/3892596 --> 10.1.202.203:6803/5220 -- osd_ping(ping
> >>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877512aa00
> >con
> >>> 0x7f8774eae280
> >>>  -9996> 2020-02-01 03:45:29.491150 7f86f9253700  1 --
> >>> 10.1.201.201:0/3892596 --> 10.1.201.203:6811/15624 -- osd_ping(ping
> >>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877512ac00
> >con
> >>> 0x7f8774eae700
> >>>  -9995> 2020-02-01 03:45:29.491148 7f86ec1a4700  1 --
> >>> 10.1.201.201:0/3892596 <== osd.8 10.1.202.202:6812/2131331 12 ====
> >>> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2
> >====
> >>> 47+0+0 (4038678352 0 0) 0x7f8781380200 con 0x7f8774eb5200
> >>>  -9994> 2020-02-01 03:45:29.491163 7f86f9253700  1 --
> >>> 10.1.201.201:0/3892596 --> 10.1.202.203:6811/15624 -- osd_ping(ping
> >>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877512be00
> >con
> >>> 0x7f8774eae880
> >>>  -9993> 2020-02-01 03:45:29.491176 7f86f9253700  1 --
> >>> 10.1.201.201:0/3892596 --> 10.1.201.203:6801/4089 -- osd_ping(ping
> >>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8775128a00
> >con
> >>> 0x7f8774eaf780
> >>>  -9992> 2020-02-01 03:45:29.491169 7f86e8447700  1 --
> >>> 10.1.201.201:0/3892596 <== osd.9 10.1.202.202:6811/3006646 12 ====
> >>> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2
> >====
> >>> 47+0+0 (4038678352 0 0) 0x7f8781380400 con 0x7f8774eaf600
> >>>  -9991> 2020-02-01 03:45:29.491167 7f86e975a700  1 --
> >>> 10.1.201.201:0/3892596 <== osd.10 10.1.201.202:6801/3005449 12 ====
> >>> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2
> >====
> >>> 47+0+0 (4038678352 0 0) 0x7f8781380000 con 0x7f8774eb7c00
> >>>  -9990> 2020-02-01 03:45:29.491192 7f86f9253700  1 --
> >>> 10.1.201.201:0/3892596 --> 10.1.202.203:6801/4089 -- osd_ping(ping
> >>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8775128c00
> >con
> >>> 0x7f8774eaf900
> >>>  -9989> 2020-02-01 03:45:29.491206 7f86f9253700  1 --
> >>> 10.1.201.201:0/3892596 --> 10.1.201.203:6813/17285 -- osd_ping(ping
> >>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8775451e00
> >con
> >>> 0x7f8774ee3180
> >>>  -9988> 2020-02-01 03:45:29.491184 7f86e9255700  1 --
> >>> 10.1.201.201:0/3892596 <== osd.11 10.1.201.202:6809/2004102 12 ====
> >>> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2
> >====
> >>> 47+0+0 (4038678352 0 0) 0x7f8781385800 con 0x7f8774eade00
> >>>  -9987> 2020-02-01 03:45:29.491202 7f86e9558700  1 --
> >>> 10.1.201.201:0/3892596 <== osd.10 10.1.202.202:6801/3005449 12 ====
> >>> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2
> >====
> >>> 47+0+0 (4038678352 0 0) 0x7f8781385400 con 0x7f8774eb7d80
> >>>  -9986> 2020-02-01 03:45:29.491233 7f86f9253700  1 --
> >>> 10.1.201.201:0/3892596 --> 10.1.202.203:6813/17285 -- osd_ping(ping
> >>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f877544fe00
> >con
> >>> 0x7f8774ee3300
> >>>  -9985> 2020-02-01 03:45:29.491238 7f86e9053700  1 --
> >>> 10.1.201.201:0/3892596 <== osd.11 10.1.202.202:6813/2004102 12 ====
> >>> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2
> >====
> >>> 47+0+0 (4038678352 0 0) 0x7f8781383800 con 0x7f8774eadf80
> >>>  -9984> 2020-02-01 03:45:29.491234 7f86e874a700  1 --
> >>> 10.1.201.201:0/3892596 <== osd.12 10.1.201.202:6803/4953 12 ====
> >>> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2
> >====
> >>> 47+0+0 (4038678352 0 0) 0x7f8781383a00 con 0x7f8774eae400
> >>>  -9983> 2020-02-01 03:45:29.491247 7f86f9253700  1 --
> >>> 10.1.201.201:0/3892596 --> 10.1.201.204:6807/13468 -- osd_ping(ping
> >>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8775576000
> >con
> >>> 0x7f8774eb5e00
> >>>  -9982> 2020-02-01 03:45:29.491258 7f86ecbae700  1 --
> >>> 10.1.201.201:0/3892596 <== osd.13 10.1.201.202:6805/2390794 12 ====
> >>> osd_ping(ping_reply e126359 stamp 2020-02-01 03:45:29.490356) v2
> >====
> >>> 47+0+0 (4038678352 0 0) 0x7f87813a6400 con 0x7f8774ee1980
> >>>  -9981> 2020-02-01 03:45:29.491266 7f86f9253700  1 --
> >>> 10.1.201.201:0/3892596 --> 10.1.202.204:6807/13468 -- osd_ping(ping
> >>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8775576e00
> >con
> >>> 0x7f8774eb5f80
> >>>  -9980> 2020-02-01 03:45:29.491291 7f86f9253700  1 --
> >>> 10.1.201.201:0/3892596 --> 10.1.201.204:6809/410784 -- osd_ping(ping
> >>> e126357 stamp 2020-02-01 03:45:29.490356) v2 -- ?+0 0x7f8774ffe400
> >con
> >>> 0x7f8774eac900
> >>>
> >>> ---------------- ceph -s
> >>> ------------------------------------------------------
> >>>
> >>>     cluster 57d2e172-0306-4f8a-946c-a4ae1e95b26b
> >>>      health HEALTH_WARN
> >>>             7 pgs backfill_wait
> >>>             11 pgs backfilling
> >>>             9 pgs degraded
> >>>             4 pgs recovery_wait
> >>>             21 pgs stuck unclean
> >>>             6 pgs undersized
> >>>             100 requests are blocked > 32 sec
> >>>             recovery 649692/75245727 objects degraded (0.863%)
> >>>             recovery 1027694/75245727 objects misplaced (1.366%)
> >>>             recovery 1/24998284 unfound (0.000%)
> >>>             pool default.rgw.buckets.data has many more objects per
> >pg
> >>> than average (too few pgs?)
> >>>      monmap e1: 3 mons at
> >>>
> >{cephmon01=
> 10.1.202.199:6789/0,cephmon02=10.1.202.198:6789/0,cephmon03=10.1.202.197:6789/0
> }
> >>>             election epoch 562, quorum 0,1,2
> >>> cephmon03,cephmon02,cephmon01
> >>>      osdmap e127248: 42 osds: 37 up, 37 in; 20 remapped pgs
> >>>             flags sortbitwise,require_jewel_osds
> >>>       pgmap v92257695: 1568 pgs, 14 pools, 18271 GB data, 24412
> >kobjects
> >>>             55189 GB used, 45363 GB / 100553 GB avail
> >>>             649692/75245727 objects degraded (0.863%)
> >>>             1027694/75245727 objects misplaced (1.366%)
> >>>             1/24998284 unfound (0.000%)
> >>>                 1540 active+clean
> >>>                    8 active+remapped+backfilling
> >>>                    5 active+remapped+wait_backfill
> >>>                    3 active+clean+scrubbing
> >>>                    3 active+clean+scrubbing+deep
> >>>                    3 active+undersized+degraded+remapped+backfilling
> >>>                    2
> >active+undersized+degraded+remapped+wait_backfill
> >>>                    2 active+recovery_wait+degraded
> >>>                    1
> >active+recovery_wait+undersized+degraded+remapped
> >>>                    1 active+recovery_wait+degraded+remapped
> >>> recovery io 239 MB/s, 187 objects/s
> >>>   client io 575 kB/s wr, 0 op/s rd, 37 op/s wr
> >>>
> >>> ---------------- ceph osd tree
> >>> ------------------------------------------------------
> >>>
> >>> [root@ceph01 ceph]# ceph osd tree
> >>> ID WEIGHT    TYPE NAME       UP/DOWN REWEIGHT PRIMARY-AFFINITY
> >>> -1 108.19864 root default
> >>> -2  19.09381     host ceph01
> >>>  0   2.72769         osd.0        up  1.00000          1.00000
> >>>  1   2.72769         osd.1      down        0          1.00000 <--
> >now
> >>> down
> >>>  2   2.72769         osd.2        up  1.00000          1.00000
> >>>  5   2.72769         osd.5        up  1.00000          1.00000
> >>>  6   2.72768         osd.6        up  1.00000          1.00000
> >>>  3   2.72768         osd.3        up  1.00000          1.00000
> >>>  4   2.72769         osd.4        up  1.00000          1.00000
> >>> -3  19.09383     host ceph02
> >>>  8   2.72769         osd.8        up  1.00000          1.00000
> >>>  9   2.72769         osd.9        up  1.00000          1.00000
> >>> 10   2.72769         osd.10       up  1.00000          1.00000
> >>> 12   2.72769         osd.12       up  1.00000          1.00000
> >>> 11   2.72769         osd.11       up  1.00000          1.00000
> >>>  7   2.72768         osd.7        up  1.00000          1.00000
> >>> 13   2.72769         osd.13       up  1.00000          1.00000
> >>> -4  16.36626     host ceph03
> >>> 14   2.72769         osd.14       up  1.00000          1.00000
> >>> 16   2.72769         osd.16       up  1.00000          1.00000
> >>> 17   2.72769         osd.17       up  1.00000          1.00000
> >>> 19   2.72769         osd.19       up  1.00000          1.00000
> >>> 15   1.81850         osd.15       up  1.00000          1.00000
> >>> 18   1.81850         osd.18       up  1.00000          1.00000
> >>> 20   1.81850         osd.20       up  1.00000          1.00000
> >>> -5  15.45706     host ceph04
> >>> 23   2.72769         osd.23       up  1.00000          1.00000
> >>> 24   2.72769         osd.24       up  1.00000          1.00000
> >>> 27   2.72769         osd.27     down        0          1.00000 <--
> >>> more then 3month ago
> >>> 21   1.81850         osd.21       up  1.00000          1.00000
> >>> 22   1.81850         osd.22       up  1.00000          1.00000
> >>> 25   1.81850         osd.25       up  1.00000          1.00000
> >>> 26   1.81850         osd.26       up  1.00000          1.00000
> >>> -6  19.09384     host ceph05
> >>> 28   2.72769         osd.28       up  1.00000          1.00000
> >>> 29   2.72769         osd.29       up  1.00000          1.00000
> >>> 30   2.72769         osd.30       up  1.00000          1.00000
> >>> 31   2.72769         osd.31     down        0          1.00000 <--
> >>> more then 3month ago
> >>> 32   2.72769         osd.32       up  1.00000          1.00000
> >>> 34   2.72769         osd.34       up  1.00000          1.00000
> >>> 33   2.72769         osd.33     down        0          1.00000 <--
> >>> more then 3month ago
> >>> -7  19.09384     host ceph06
> >>> 35   2.72769         osd.35       up  1.00000          1.00000
> >>> 36   2.72769         osd.36       up  1.00000          1.00000
> >>> 37   2.72769         osd.37       up  1.00000          1.00000
> >>> 39   2.72769         osd.39       up  1.00000          1.00000
> >>> 40   2.72769         osd.40       up  1.00000          1.00000
> >>> 41   2.72769         osd.41       up  1.00000          1.00000
> >>> 38   2.72769         osd.38     down        0          1.00000 <--
> >>> more then 3month ago
> >>>
> >>>
> >>> ------------------------------
> >>>
> >>> On 2020/02/02 11:20, 西宮 牧人 wrote:
> >>>> Servers: 6 (include 7osds) total 42osdsl
> >>>> OS: Centos7
> >>>> Ceph: 10.2.5
> >>>>
> >>>> Hi, everyone
> >>>>
> >>>> The cluster is used for VM image storage and object storage.
> >>>> And I have a bucket which has more than 20 million objects.
> >>>>
> >>>> Now, I have a problem that cluster blocks operation.
> >>>>
> >>>> Suddenly cluster blocked operations, then VMs can't read disk.
> >>>> After a few hours, osd.1 was down.
> >>>>
> >>>> There is no disk fail messages in dmesg.
> >>>> And no error is in smartctl -a /dev/sde.
> >>>>
> >>>> I tried to wake up osd.1, but osd.1 is down soon.
> >>>> Just after re-waking up osd.1, VM can access to the disk.
> >>>> But osd.1 always uses 100% CPU, then cluster marked osd.1 down and
> >>>> the osd was dead by suicide timeout.
> >>>>
> >>>> I found that the osdmap epoch of osd.1 is different from other one.
> >>>> So I think osd.1 was dead.
> >>>>
> >>>>
> >>>> Question.
> >>>> (1) Why does the epoch of osd.1 differ from other osds ones ?
> >>>>
> >>>>   I checked all osds oldest_map and newest_map by ~ceph daemon
> >osd.X
> >>>> status~
> >>>>   All osd's ecpoch are same number except osd.1
> >>>>
> >>>> (2) Why does osd.1 use CPU full?
> >>>>
> >>>>   After the cluster marked osd.1 down, osd.1 keeps up busy.
> >>>>   When I execute "ceph tell osd.1 injectargs --debug-ms 5/1", osd.1
> >>>> doesn't answer.
> >>>>
> >>>>
> >>>> Thank you.
> >>>
> >_______________________________________________
> >ceph-users mailing list -- ceph-users@xxxxxxx
> >To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

-- 

忍者ツールズ株式会社
　西宮　牧人

〒160-0015
東京都新宿区大京町29 御苑プラザビル 901号室
Tel:03-4405-9826
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx