Fwd: cluster health checks

Gregory Meno <gmeno@xxxxxxxxxx> · Thu, 20 Jul 2017 14:55:30 -0700

You might want to know about this change coming
"This would be a semi-incompatible change with pre-luminous ceph CLI"

cheers,
Gregory

---------- Forwarded message ----------
From: Sage Weil <sweil@xxxxxxxxxx>
Date: Tue, Jun 13, 2017 at 12:34 PM
Subject: cluster health checks
To: jspray@xxxxxxxxxx
Cc: ceph-devel@xxxxxxxxxxxxxxx

I've put together a rework of the cluster health checks at

        https://github.com/ceph/ceph/pull/15643

based on John's original proposal in

        http://tracker.ceph.com/issues/7192

(with a few changes).  I think it's pretty complete except that the

MDSMonitor new-style checks aren't implemented yet.

This would be a semi-incompatible change with pre-luminous ceph in that

 - the structured (json/xml) health output is totally different

 - the plaintext health *detail* output is different

 - specific error messages are a bit different.  I was reimplementing them

and took the liberty of revising what information was in the

summary and detail in several cases.

Let me know what you think!

Thanks-

sage

$ ceph -s

  cluster:

    id:     9ee7f49c-57c3-4686-afd1-75b3a8f08c73

    health: HEALTH_WARN

            2 osds down

            1 host (2 osds) down

            1 root (2 osds) down

            8 pgs stale

  services:

    mon: 3 daemons, quorum a,b,c

    mgr: x(active)

    osd: 2 osds: 0 up, 2 in

  data:

    pools:   1 pools, 8 pgs

    objects: 0 objects, 0 bytes

    usage:   414 GB used, 330 GB / 744 GB avail

    pgs:     8 stale+active+clean

$ ceph health detail -f json-pretty

{

    "checks": {

        "OSD_DOWN": {

            "severity": "HEALTH_WARN",

            "message": "2 osds down"

        },

        "OSD_HOST_DOWN": {

            "severity": "HEALTH_WARN",

            "message": "1 host (2 osds) down"

        },

        "OSD_ROOT_DOWN": {

            "severity": "HEALTH_WARN",

            "message": "1 root (2 osds) down"

        },

        "PG_STALE": {

            "severity": "HEALTH_WARN",

            "message": "8 pgs stale"

        }

    },

    "status": "HEALTH_WARN",

    "detail": {

        "OSD_DOWN": [

            "osd.0 (root=default,host=gnit) is down",

            "osd.1 (root=default,host=gnit) is down"

        ],

        "OSD_HOST_DOWN": [

            "host gnit (root=default) (2 osds) is down"

        ],

        "OSD_ROOT_DOWN": [

            "root default (2 osds) is down"

        ],

        "PG_STALE": [

            "pg 0.7 is stale+active+clean, acting [1,0]",

            "pg 0.6 is stale+active+clean, acting [0,1]",

            "pg 0.5 is stale+active+clean, acting [0,1]",

            "pg 0.4 is stale+active+clean, acting [0,1]",

            "pg 0.0 is stale+active+clean, acting [0,1]",

            "pg 0.1 is stale+active+clean, acting [1,0]",

            "pg 0.2 is stale+active+clean, acting [0,1]",

            "pg 0.3 is stale+active+clean, acting [0,1]"

        ]

    }

}

$ ceph health detail

HEALTH_WARN 2 osds down; 1 host (2 osds) down; 1 root (2 osds) down; 8 pgs stale

OSD_DOWN 2 osds down

    osd.0 (root=default,host=gnit) is down

    osd.1 (root=default,host=gnit) is down

OSD_HOST_DOWN 1 host (2 osds) down

    host gnit (root=default) (2 osds) is down

OSD_ROOT_DOWN 1 root (2 osds) down

    root default (2 osds) is down

PG_STALE 8 pgs stale

    pg 0.7 is stale+active+clean, acting [1,0]

    pg 0.6 is stale+active+clean, acting [0,1]

    pg 0.5 is stale+active+clean, acting [0,1]

    pg 0.4 is stale+active+clean, acting [0,1]

    pg 0.0 is stale+active+clean, acting [0,1]

    pg 0.1 is stale+active+clean, acting [1,0]

    pg 0.2 is stale+active+clean, acting [0,1]

    pg 0.3 is stale+active+clean, acting [0,1]

--

To unsubscribe from this list: send the line "unsubscribe ceph-devel" in

the body of a message to majordomo@xxxxxxxxxxxxxxx

More majordomo info at  http://vger.kernel.org/majordomo-info.html

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com