Re: Self shutdown of 1 whole system (Derbian stretch/Ceph 12.2.7/bluestore)

Nicolas Huillard <nhuillard@xxxxxxxxxxx> · Sun, 22 Jul 2018 21:04:20 +0200

Le dimanche 22 juillet 2018 à 02:44 +0200, Oliver Freyermuth a écrit :
> Since all services are running on these machines - are you by any
> chance running low on memory? 
> Do you have a monitoring of this? 

I have Munin monitoring on all hosts, but nothing special to notice,
except for a +3°C temperature increase on the OSD spinning drives
(probably the long selft-test).
Memory at 1/4th, CPU at "not noticeable", on a generally quiet cluster.
The temperature increase on the drives was not backed by some increase
of IRQ activity or anything else...

> We observe some strange issues with our servers if they run for a
> long while, and with high memory pressure (more memory is
> ordered...). 
> Then, it seems our Infiniband driver can not allocate sufficiently
> large pages anymore, communication is lost between the Ceph nodes,
> recovery starts,
> memory usage grows even higher from this, etc. 
> In some cases, it seems this may lead to a freeze / lockup (not
> reboot). My feeling is that the CentOS 7.5 kernel is not doing as
> well on memory compaction as the modern kernels do. 

I even have no fancy kernel or device, just real standard Debian. The
uptime was 6 days since the upgrade from 12.2.6...

> Right now, this is just a hunch of mine, but my recommendation would
> be to have some monitoring of the machine and see if something
> strange happens in terms of memory usage, CPU usage, or disk I/O
> (e.g. iowait)
> to further pin down the issue. It may as well be something completely
> different. 
> 
> Other options to investigate would be a potential kernel stacktrace
> in pstore, or something in mcelog. 

I'll investigate on some obscure monitoring tools which stored log
file, yes.

Thanks !

> 
> Cheers,
> 	Oliver
> 
> Am 21.07.2018 um 14:34 schrieb Nicolas Huillard:
> > I forgot to mention that this server, along with all the other Ceph
> > servers in my cluster, do not run anything else than Ceph, and each
> > run
> >  all the Ceph daemons (mon, mgr, mds, 2×osd).
> > 
> > Le samedi 21 juillet 2018 à 10:31 +0200, Nicolas Huillard a écrit :
> > > Hi all,
> > > 
> > > One of my server silently shutdown last night, with no
> > > explanation
> > > whatsoever in any logs. According to the existing logs, the
> > > shutdown
> > > (without reboot) happened between 03:58:20.061452 (last timestamp
> > > from
> > > /var/log/ceph/ceph-mgr.oxygene.log) and 03:59:01.515308 (new MON
> > > election called, for which oxygene didn't answer).
> > > 
> > > Is there any way in which Ceph could silently shutdown a server?
> > > Can SMART self-test influence scrubbing or compaction?
> > > 
> > > The only thing I have is that smartd stated a long self-test on
> > > both
> > > OSD spinning drives on that host:
> > > Jul 21 03:21:35 oxygene smartd[712]: Device: /dev/sda [SAT],
> > > starting
> > > scheduled Long Self-Test.
> > > Jul 21 03:21:35 oxygene smartd[712]: Device: /dev/sdb [SAT],
> > > starting
> > > scheduled Long Self-Test.
> > > Jul 21 03:21:35 oxygene smartd[712]: Device: /dev/sdc [SAT],
> > > starting
> > > scheduled Long Self-Test.
> > > Jul 21 03:51:35 oxygene smartd[712]: Device: /dev/sda [SAT],
> > > self-
> > > test in progress, 90% remaining
> > > Jul 21 03:51:35 oxygene smartd[712]: Device: /dev/sdb [SAT],
> > > self-
> > > test in progress, 90% remaining
> > > Jul 21 03:51:35 oxygene smartd[712]: Device: /dev/sdc [SAT],
> > > previous
> > > self-test completed without error
> > > 
> > > ...and smartctl now says that the self-tests didn't finish (on
> > > both
> > > drives) :
> > > # 1  Extended offline    Interrupted (host
> > > reset)      00%     10636         -
> > > 
> > > MON logs on oxygene talks about rockdb compaction a few minutes
> > > before
> > > the shutdown, and a deep-scrub finished earlier:
> > > /var/log/ceph/ceph-osd.6.log
> > > 2018-07-21 03:32:54.086021 7fd15d82c700  0 log_channel(cluster)
> > > log
> > > [DBG] : 6.1d deep-scrub starts
> > > 2018-07-21 03:34:31.185549 7fd15d82c700  0 log_channel(cluster)
> > > log
> > > [DBG] : 6.1d deep-scrub ok
> > > 2018-07-21 03:43:36.720707 7fd178082700  0 --
> > > 172.22.0.16:6801/478362
> > > > > 172.21.0.16:6800/1459922146 conn(0x556f0642b800 :6801
> > > 
> > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
> > > l=1).handle_connect_msg: challenging authorizer
> > > 
> > > /var/log/ceph/ceph-mgr.oxygene.log
> > > 2018-07-21 03:58:16.060137 7fbcd3777700  1 mgr send_beacon
> > > standby
> > > 2018-07-21 03:58:18.060733 7fbcd3777700  1 mgr send_beacon
> > > standby
> > > 2018-07-21 03:58:20.061452 7fbcd3777700  1 mgr send_beacon
> > > standby
> > > 
> > > /var/log/ceph/ceph-mon.oxygene.log
> > > 2018-07-21 03:52:27.702314 7f25b5406700  4 rocksdb: (Original Log
> > > Time 2018/07/21-03:52:27.702302) [/build/ceph-
> > > 12.2.7/src/rocksdb/db/db_impl_compaction_flush.cc:1392] [default]
> > > Manual compaction from level-0 to level-1 from 'mgrstat .. '
> > > 2018-07-21 03:52:27.702321 7f25b5406700  4 rocksdb: [/build/ceph-
> > > 12.2.7/src/rocksdb/db/compaction_job.cc:1403] [default] [JOB
> > > 1746]
> > > Compacting 1@0 + 1@1 files to L1, score -1.00
> > > 2018-07-21 03:52:27.702329 7f25b5406700  4 rocksdb: [/build/ceph-
> > > 12.2.7/src/rocksdb/db/compaction_job.cc:1407] [default]
> > > Compaction
> > > start summary: Base version 1745 Base level 0, inputs:
> > > [149507(602KB)], [149505(13MB)]
> > > 2018-07-21 03:52:27.702348 7f25b5406700  4 rocksdb: EVENT_LOG_v1
> > > {"time_micros": 1532137947702334, "job": 1746, "event":
> > > "compaction_started", "files_L0": [149507], "files_L1": [149505],
> > > "score": -1, "input_data_size": 14916379}
> > > 2018-07-21 03:52:27.785532 7f25b5406700  4 rocksdb: [/build/ceph-
> > > 12.2.7/src/rocksdb/db/compaction_job.cc:1116] [default] [JOB
> > > 1746]
> > > Generated table #149508: 4904 keys, 14808953 bytes
> > > 2018-07-21 03:52:27.785587 7f25b5406700  4 rocksdb: EVENT_LOG_v1
> > > {"time_micros": 1532137947785565, "cf_name": "default", "job":
> > > 1746,
> > > "event": "table_file_creation", "file_number": 149508,
> > > "file_size":
> > > 14808953, "table_properties": {"data
> > > 2018-07-21 03:52:27.785627 7f25b5406700  4 rocksdb: [/build/ceph-
> > > 12.2.7/src/rocksdb/db/compaction_job.cc:1173] [default] [JOB
> > > 1746]
> > > Compacted 1@0 + 1@1 files to L1 => 14808953 bytes
> > > 2018-07-21 03:52:27.785656 7f25b5406700  3 rocksdb: [/build/ceph-
> > > 12.2.7/src/rocksdb/db/version_set.cc:2087] More existing levels
> > > in DB
> > > than needed. max_bytes_for_level_multiplier may not be
> > > guaranteed.
> > > 2018-07-21 03:52:27.791640 7f25b5406700  4 rocksdb: (Original Log
> > > Time 2018/07/21-03:52:27.791526) [/build/ceph-
> > > 12.2.7/src/rocksdb/db/compaction_job.cc:621] [default] compacted
> > > to:
> > > base level 1 max bytes base 26843546 files[0 1 0 0 0 0 0]
> > > 2018-07-21 03:52:27.791657 7f25b5406700  4 rocksdb: (Original Log
> > > Time 2018/07/21-03:52:27.791563) EVENT_LOG_v1 {"time_micros":
> > > 1532137947791548, "job": 1746, "event": "compaction_finished",
> > > "compaction_time_micros": 83261, "output_level"
> > > 2018-07-21 03:52:27.792024 7f25b5406700  4 rocksdb: EVENT_LOG_v1
> > > {"time_micros": 1532137947792019, "job": 1746, "event":
> > > "table_file_deletion", "file_number": 149507}
> > > 2018-07-21 03:52:27.796596 7f25b5406700  4 rocksdb: EVENT_LOG_v1
> > > {"time_micros": 1532137947796592, "job": 1746, "event":
> > > "table_file_deletion", "file_number": 149505}
> > > 2018-07-21 03:52:27.796690 7f25b6408700  4 rocksdb: [/build/ceph-
> > > 12.2.7/src/rocksdb/db/db_impl_compaction_flush.cc:839] [default]
> > > Manual compaction starting
> > > ...
> > > 2018-07-21 03:53:33.404428 7f25b5406700  4 rocksdb: [/build/ceph-
> > > 12.2.7/src/rocksdb/db/compaction_job.cc:1173] [default] [JOB
> > > 1748]
> > > Compacted 1@0 + 1@1 files to L1 => 14274825 bytes
> > > 2018-07-21 03:53:33.404460 7f25b5406700  3 rocksdb: [/build/ceph-
> > > 12.2.7/src/rocksdb/db/version_set.cc:2087] More existing levels
> > > in DB
> > > than needed. max_bytes_for_level_multiplier may not be
> > > guaranteed.
> > > 2018-07-21 03:53:33.408360 7f25b5406700  4 rocksdb: (Original Log
> > > Time 2018/07/21-03:53:33.408228) [/build/ceph-
> > > 12.2.7/src/rocksdb/db/compaction_job.cc:621] [default] compacted
> > > to:
> > > base level 1 max bytes base 26843546 files[0 1 0 0 0 0 0]
> > > 2018-07-21 03:53:33.408381 7f25b5406700  4 rocksdb: (Original Log
> > > Time 2018/07/21-03:53:33.408275) EVENT_LOG_v1 {"time_micros":
> > > 1532138013408255, "job": 1748, "event": "compaction_finished",
> > > "compaction_time_micros": 84964, "output_level"
> > > 2018-07-21 03:53:33.408647 7f25b5406700  4 rocksdb: EVENT_LOG_v1
> > > {"time_micros": 1532138013408641, "job": 1748, "event":
> > > "table_file_deletion", "file_number": 149510}
> > > 2018-07-21 03:53:33.413854 7f25b5406700  4 rocksdb: EVENT_LOG_v1
> > > {"time_micros": 1532138013413849, "job": 1748, "event":
> > > "table_file_deletion", "file_number": 149508}
> > > 2018-07-21 03:54:27.634782 7f25bdc17700  0 mon.oxygene@3(peon).da
> > > ta_h
> > > ealth(66142) update_stats avail 79% total 4758 MB, used 991 MB,
> > > avail
> > > 3766 MB
> > > 2018-07-21 03:55:27.635318 7f25bdc17700  0 mon.oxygene@3(peon).da
> > > ta_h
> > > ealth(66142) update_stats avail 79% total 4758 MB, used 991 MB,
> > > avail
> > > 3766 MB
> > > 2018-07-21 03:56:27.635923 7f25bdc17700  0 mon.oxygene@3(peon).da
> > > ta_h
> > > ealth(66142) update_stats avail 79% total 4758 MB, used 991 MB,
> > > avail
> > > 3766 MB
> > > 2018-07-21 03:57:27.636464 7f25bdc17700  0 mon.oxygene@3(peon).da
> > > ta_h
> > > ealth(66142) update_stats avail 79% total 4758 MB, used 991 MB,
> > > avail
> > > 3766 MB
> > > 
> > > I can see no evidence of intrusion or anything (network or
> > > physical).
> > > I'm not even sure it was a shutdown more than a hard reset, but
> > > no
> > > evidence of any fsck replaying any journal during reboot either.
> > > The server restarted without problem and the cluster is now
> > > HEALTH_OK.
> > >                                     
> > > Hardware:
> > > * ASRock Rack mobos (the BMC/IPMI may have reset the server for
> > > no
> > > reason)
> > > * Western Digital ST4000VN008 OSD drives
> > > 
> 
> 
-- 
Nicolas Huillard
Associé fondateur - Directeur Technique - Dolomède

nhuillard@xxxxxxxxxxx
Fixe : +33 9 52 31 06 10
Mobile : +33 6 50 27 69 08
http://www.dolomede.fr/

https://reseauactionclimat.org/planetman/
http://climat-2020.eu/
http://www.350.org/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com