Re: ceph-osd restartd via systemd in case of disk error

Stanley Zhang <stanley.zhang@xxxxxxxxxxxx> · Wed, 20 Sep 2017 09:25:44 +1200

    I like this, there is some similar ideas we probably can borrow
      from Cassandra on disk failure

# policy for data disk failures:

        # die: shut down gossip and Thrift and kill the JVM for any fs
        errors or

        #      single-sstable errors, so the node can be replaced.

        # stop_paranoid: shut down gossip and Thrift even for
        single-sstable errors.

        # stop: shut down gossip and Thrift, leaving the node
        effectively dead, but

        #       can still be inspected via JMX.

        # best_effort: stop using the failed disk and respond to
        requests based on

        #              remaining available sstables.  This means you
        WILL see obsolete

        #              data at CL.ONE!

        # ignore: ignore fatal errors and let requests fail, as in
        pre-1.2 Cassandra

        disk_failure_policy: stop_paranoid
      Regards
    Stanley

    On 19/09/17 9:16 PM, Manuel Lausch
      wrote:

      Am Tue, 19 Sep 2017 08:24:48 +0000
schrieb Adrian Saul <Adrian.Saul@xxxxxxxxxxxxxxxxx>:

          I understand what you mean and it's indeed dangerous, but see:
https://github.com/ceph/ceph/blob/master/systemd/ceph-osd%40.service

Looking at the systemd docs it's difficult though:
https://www.freedesktop.org/software/systemd/man/systemd.service.ht
ml

If the OSD crashes due to another bug you do want it to restart.

But for systemd it's not possible to see if the crash was due to a
disk I/O- error or a bug in the OSD itself or maybe the OOM-killer
or something.

        Perhaps using something like RestartPreventExitStatus and defining a
specific exit code for the OSD to exit on when it is exiting due to
an IO error.

      A other idea: The OSD daemon keeps running in a defined error state
and only stops the listeners with other OSDs and the clients. 

    -- 

       Stanley Zhang |  Senior Operations Engineer 

        Telephone: +64 9 302 0515 Fax:
        +64 9 302 0518 

        Mobile: +64 22 318 3664 Freephone:
        0800 SMX SMX (769 769) 

        SMX Limited: Level 15, 19 Victoria
          Street West, Auckland, New Zealand 

        Web: http://smxemail.com

  This email has been filtered by SMX. For more information visit smxemail.com.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com