Re: ceph-osd restartd via systemd in case of disk error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I like this, there is some similar ideas we probably can borrow from Cassandra on disk failure

# policy for data disk failures:
# die: shut down gossip and Thrift and kill the JVM for any fs errors or
#      single-sstable errors, so the node can be replaced.
# stop_paranoid: shut down gossip and Thrift even for single-sstable errors.
# stop: shut down gossip and Thrift, leaving the node effectively dead, but
#       can still be inspected via JMX.
# best_effort: stop using the failed disk and respond to requests based on
#              remaining available sstables.  This means you WILL see obsolete
#              data at CL.ONE!
# ignore: ignore fatal errors and let requests fail, as in pre-1.2 Cassandra
disk_failure_policy: stop_paranoid
Regards

Stanley


On 19/09/17 9:16 PM, Manuel Lausch wrote:
Am Tue, 19 Sep 2017 08:24:48 +0000
schrieb Adrian Saul <Adrian.Saul@xxxxxxxxxxxxxxxxx>:

I understand what you mean and it's indeed dangerous, but see:
https://github.com/ceph/ceph/blob/master/systemd/ceph-osd%40.service

Looking at the systemd docs it's difficult though:
https://www.freedesktop.org/software/systemd/man/systemd.service.ht
ml

If the OSD crashes due to another bug you do want it to restart.

But for systemd it's not possible to see if the crash was due to a
disk I/O- error or a bug in the OSD itself or maybe the OOM-killer
or something.
Perhaps using something like RestartPreventExitStatus and defining a
specific exit code for the OSD to exit on when it is exiting due to
an IO error.
A other idea: The OSD daemon keeps running in a defined error state
and only stops the listeners with other OSDs and the clients. 



--

Stanley Zhang | Senior Operations Engineer
Telephone: +64 9 302 0515 Fax: +64 9 302 0518
Mobile: +64 22 318 3664 Freephone: 0800 SMX SMX (769 769)
SMX Limited: Level 15, 19 Victoria Street West, Auckland, New Zealand
Web: http://smxemail.com
SMX |
          Cloud Email Hosting & Security

This email has been filtered by SMX. For more information visit smxemail.com.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux