Re: Self shutdown of 1 whole system (Derbian stretch/Ceph 12.2.7/bluestore)

Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx> · Sun, 22 Jul 2018 02:44:08 +0200

Since all services are running on these machines - are you by any chance running low on memory? 
Do you have a monitoring of this? 

We observe some strange issues with our servers if they run for a long while, and with high memory pressure (more memory is ordered...). 
Then, it seems our Infiniband driver can not allocate sufficiently large pages anymore, communication is lost between the Ceph nodes, recovery starts,
memory usage grows even higher from this, etc. 
In some cases, it seems this may lead to a freeze / lockup (not reboot). My feeling is that the CentOS 7.5 kernel is not doing as well on memory compaction as the modern kernels do. 

Right now, this is just a hunch of mine, but my recommendation would be to have some monitoring of the machine and see if something strange happens in terms of memory usage, CPU usage, or disk I/O (e.g. iowait)
to further pin down the issue. It may as well be something completely different. 

Other options to investigate would be a potential kernel stacktrace in pstore, or something in mcelog. 

Cheers,
	Oliver

Am 21.07.2018 um 14:34 schrieb Nicolas Huillard:
> I forgot to mention that this server, along with all the other Ceph
> servers in my cluster, do not run anything else than Ceph, and each run
>  all the Ceph daemons (mon, mgr, mds, 2×osd).
> 
> Le samedi 21 juillet 2018 à 10:31 +0200, Nicolas Huillard a écrit :
>> Hi all,
>>
>> One of my server silently shutdown last night, with no explanation
>> whatsoever in any logs. According to the existing logs, the shutdown
>> (without reboot) happened between 03:58:20.061452 (last timestamp
>> from
>> /var/log/ceph/ceph-mgr.oxygene.log) and 03:59:01.515308 (new MON
>> election called, for which oxygene didn't answer).
>>
>> Is there any way in which Ceph could silently shutdown a server?
>> Can SMART self-test influence scrubbing or compaction?
>>
>> The only thing I have is that smartd stated a long self-test on both
>> OSD spinning drives on that host:
>> Jul 21 03:21:35 oxygene smartd[712]: Device: /dev/sda [SAT], starting
>> scheduled Long Self-Test.
>> Jul 21 03:21:35 oxygene smartd[712]: Device: /dev/sdb [SAT], starting
>> scheduled Long Self-Test.
>> Jul 21 03:21:35 oxygene smartd[712]: Device: /dev/sdc [SAT], starting
>> scheduled Long Self-Test.
>> Jul 21 03:51:35 oxygene smartd[712]: Device: /dev/sda [SAT], self-
>> test in progress, 90% remaining
>> Jul 21 03:51:35 oxygene smartd[712]: Device: /dev/sdb [SAT], self-
>> test in progress, 90% remaining
>> Jul 21 03:51:35 oxygene smartd[712]: Device: /dev/sdc [SAT], previous
>> self-test completed without error
>>
>> ...and smartctl now says that the self-tests didn't finish (on both
>> drives) :
>> # 1  Extended offline    Interrupted (host
>> reset)      00%     10636         -
>>
>> MON logs on oxygene talks about rockdb compaction a few minutes
>> before
>> the shutdown, and a deep-scrub finished earlier:
>> /var/log/ceph/ceph-osd.6.log
>> 2018-07-21 03:32:54.086021 7fd15d82c700  0 log_channel(cluster) log
>> [DBG] : 6.1d deep-scrub starts
>> 2018-07-21 03:34:31.185549 7fd15d82c700  0 log_channel(cluster) log
>> [DBG] : 6.1d deep-scrub ok
>> 2018-07-21 03:43:36.720707 7fd178082700  0 -- 172.22.0.16:6801/478362
>>>> 172.21.0.16:6800/1459922146 conn(0x556f0642b800 :6801
>>
>> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
>> l=1).handle_connect_msg: challenging authorizer
>>
>> /var/log/ceph/ceph-mgr.oxygene.log
>> 2018-07-21 03:58:16.060137 7fbcd3777700  1 mgr send_beacon standby
>> 2018-07-21 03:58:18.060733 7fbcd3777700  1 mgr send_beacon standby
>> 2018-07-21 03:58:20.061452 7fbcd3777700  1 mgr send_beacon standby
>>
>> /var/log/ceph/ceph-mon.oxygene.log
>> 2018-07-21 03:52:27.702314 7f25b5406700  4 rocksdb: (Original Log
>> Time 2018/07/21-03:52:27.702302) [/build/ceph-
>> 12.2.7/src/rocksdb/db/db_impl_compaction_flush.cc:1392] [default]
>> Manual compaction from level-0 to level-1 from 'mgrstat .. '
>> 2018-07-21 03:52:27.702321 7f25b5406700  4 rocksdb: [/build/ceph-
>> 12.2.7/src/rocksdb/db/compaction_job.cc:1403] [default] [JOB 1746]
>> Compacting 1@0 + 1@1 files to L1, score -1.00
>> 2018-07-21 03:52:27.702329 7f25b5406700  4 rocksdb: [/build/ceph-
>> 12.2.7/src/rocksdb/db/compaction_job.cc:1407] [default] Compaction
>> start summary: Base version 1745 Base level 0, inputs:
>> [149507(602KB)], [149505(13MB)]
>> 2018-07-21 03:52:27.702348 7f25b5406700  4 rocksdb: EVENT_LOG_v1
>> {"time_micros": 1532137947702334, "job": 1746, "event":
>> "compaction_started", "files_L0": [149507], "files_L1": [149505],
>> "score": -1, "input_data_size": 14916379}
>> 2018-07-21 03:52:27.785532 7f25b5406700  4 rocksdb: [/build/ceph-
>> 12.2.7/src/rocksdb/db/compaction_job.cc:1116] [default] [JOB 1746]
>> Generated table #149508: 4904 keys, 14808953 bytes
>> 2018-07-21 03:52:27.785587 7f25b5406700  4 rocksdb: EVENT_LOG_v1
>> {"time_micros": 1532137947785565, "cf_name": "default", "job": 1746,
>> "event": "table_file_creation", "file_number": 149508, "file_size":
>> 14808953, "table_properties": {"data
>> 2018-07-21 03:52:27.785627 7f25b5406700  4 rocksdb: [/build/ceph-
>> 12.2.7/src/rocksdb/db/compaction_job.cc:1173] [default] [JOB 1746]
>> Compacted 1@0 + 1@1 files to L1 => 14808953 bytes
>> 2018-07-21 03:52:27.785656 7f25b5406700  3 rocksdb: [/build/ceph-
>> 12.2.7/src/rocksdb/db/version_set.cc:2087] More existing levels in DB
>> than needed. max_bytes_for_level_multiplier may not be guaranteed.
>> 2018-07-21 03:52:27.791640 7f25b5406700  4 rocksdb: (Original Log
>> Time 2018/07/21-03:52:27.791526) [/build/ceph-
>> 12.2.7/src/rocksdb/db/compaction_job.cc:621] [default] compacted to:
>> base level 1 max bytes base 26843546 files[0 1 0 0 0 0 0]
>> 2018-07-21 03:52:27.791657 7f25b5406700  4 rocksdb: (Original Log
>> Time 2018/07/21-03:52:27.791563) EVENT_LOG_v1 {"time_micros":
>> 1532137947791548, "job": 1746, "event": "compaction_finished",
>> "compaction_time_micros": 83261, "output_level"
>> 2018-07-21 03:52:27.792024 7f25b5406700  4 rocksdb: EVENT_LOG_v1
>> {"time_micros": 1532137947792019, "job": 1746, "event":
>> "table_file_deletion", "file_number": 149507}
>> 2018-07-21 03:52:27.796596 7f25b5406700  4 rocksdb: EVENT_LOG_v1
>> {"time_micros": 1532137947796592, "job": 1746, "event":
>> "table_file_deletion", "file_number": 149505}
>> 2018-07-21 03:52:27.796690 7f25b6408700  4 rocksdb: [/build/ceph-
>> 12.2.7/src/rocksdb/db/db_impl_compaction_flush.cc:839] [default]
>> Manual compaction starting
>> ...
>> 2018-07-21 03:53:33.404428 7f25b5406700  4 rocksdb: [/build/ceph-
>> 12.2.7/src/rocksdb/db/compaction_job.cc:1173] [default] [JOB 1748]
>> Compacted 1@0 + 1@1 files to L1 => 14274825 bytes
>> 2018-07-21 03:53:33.404460 7f25b5406700  3 rocksdb: [/build/ceph-
>> 12.2.7/src/rocksdb/db/version_set.cc:2087] More existing levels in DB
>> than needed. max_bytes_for_level_multiplier may not be guaranteed.
>> 2018-07-21 03:53:33.408360 7f25b5406700  4 rocksdb: (Original Log
>> Time 2018/07/21-03:53:33.408228) [/build/ceph-
>> 12.2.7/src/rocksdb/db/compaction_job.cc:621] [default] compacted to:
>> base level 1 max bytes base 26843546 files[0 1 0 0 0 0 0]
>> 2018-07-21 03:53:33.408381 7f25b5406700  4 rocksdb: (Original Log
>> Time 2018/07/21-03:53:33.408275) EVENT_LOG_v1 {"time_micros":
>> 1532138013408255, "job": 1748, "event": "compaction_finished",
>> "compaction_time_micros": 84964, "output_level"
>> 2018-07-21 03:53:33.408647 7f25b5406700  4 rocksdb: EVENT_LOG_v1
>> {"time_micros": 1532138013408641, "job": 1748, "event":
>> "table_file_deletion", "file_number": 149510}
>> 2018-07-21 03:53:33.413854 7f25b5406700  4 rocksdb: EVENT_LOG_v1
>> {"time_micros": 1532138013413849, "job": 1748, "event":
>> "table_file_deletion", "file_number": 149508}
>> 2018-07-21 03:54:27.634782 7f25bdc17700  0 mon.oxygene@3(peon).data_h
>> ealth(66142) update_stats avail 79% total 4758 MB, used 991 MB, avail
>> 3766 MB
>> 2018-07-21 03:55:27.635318 7f25bdc17700  0 mon.oxygene@3(peon).data_h
>> ealth(66142) update_stats avail 79% total 4758 MB, used 991 MB, avail
>> 3766 MB
>> 2018-07-21 03:56:27.635923 7f25bdc17700  0 mon.oxygene@3(peon).data_h
>> ealth(66142) update_stats avail 79% total 4758 MB, used 991 MB, avail
>> 3766 MB
>> 2018-07-21 03:57:27.636464 7f25bdc17700  0 mon.oxygene@3(peon).data_h
>> ealth(66142) update_stats avail 79% total 4758 MB, used 991 MB, avail
>> 3766 MB
>>
>> I can see no evidence of intrusion or anything (network or physical).
>> I'm not even sure it was a shutdown more than a hard reset, but no
>> evidence of any fsck replaying any journal during reboot either.
>> The server restarted without problem and the cluster is now
>> HEALTH_OK.
>>                                     
>> Hardware:
>> * ASRock Rack mobos (the BMC/IPMI may have reset the server for no
>> reason)
>> * Western Digital ST4000VN008 OSD drives
>>

Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com