Re: Troubleshooting hanging storage backend whenever there is any cluster change

David Turner <drakonstein@xxxxxxxxx> · Thu, 11 Oct 2018 16:27:12 -0400

You should definitely stop using `size 3 min_size 1` on your pools.  Go back to the default `min_size 2`.  I'm a little confused why you have 3 different CRUSH rules.  They're all identical.  You only need different CRUSH rules if you're using Erasure Coding or targeting a different set of OSDs like SSD vs HDD OSDs for different pools.
All of that said, I don't see anything in those rules that would indicate why you're having problems with accessing your data when a node is being restarted.  The `ceph status` and `ceph health detail` outputs will be helpful while it's happening.

On Thu, Oct 11, 2018 at 3:02 PM Nils Fahldieck - Profihost AG <n.fahldieck@xxxxxxxxxxxx> wrote:
Thanks for your reply. I'll capture a `ceph status` the next time I

encounter a not working RBD. Here's the other output you asked for:

$ ceph osd crush rule dump

[

    {

        "rule_id": 0,

        "rule_name": "data",

        "ruleset": 0,

        "type": 1,

        "min_size": 1,

        "max_size": 10,

        "steps": [

            {

                "op": "take",

                "item": -10000,

                "item_name": "root"

            },

            {

                "op": "chooseleaf_firstn",

                "num": 0,

                "type": "host"

            },

            {

                "op": "emit"

            }

        ]

    },

    {

        "rule_id": 1,

        "rule_name": "metadata",

        "ruleset": 1,

        "type": 1,

        "min_size": 1,

        "max_size": 10,

        "steps": [

            {

                "op": "take",

                "item": -10000,

                "item_name": "root"

            },

            {

                "op": "chooseleaf_firstn",

                "num": 0,

                "type": "host"

            },

            {

                "op": "emit"

            }

        ]

    },

    {

        "rule_id": 2,

        "rule_name": "rbd",

        "ruleset": 2,

        "type": 1,

        "min_size": 1,

        "max_size": 10,

        "steps": [

            {

                "op": "take",

                "item": -10000,

                "item_name": "root"

            },

            {

                "op": "chooseleaf_firstn",

                "num": 0,

                "type": "host"

            },

            {

                "op": "emit"

            }

        ]

    }

]

$ ceph osd pool ls detail

pool 5 'cephstor1' replicated size 3 min_size 1 crush_rule 0 object_hash

rjenkins pg_num 4096 pgp_num 4096 last_change 1217074 flags hashpspool

min_read_recency_for_promote 1 min_write_recency_for_promote 1

stripe_width 0 application rbd

        removed_snaps

[1~9,b~1,d~7d1e8,7d1f6~3d05f,ba256~4bd9,bee30~357,bf188~5531,c46ba~85b3,ccc6e~b599,d820b~1,d820d~1,d820f~1,d8211~1,d8214~1,d8216~1,d8219~2,d821d~1,d821f~1,d8221~1,d8223~1,d8226~2,d8229~1,d822b~2,d822e~2,d8231~3,d8236~1,d8238~2,d823b~1,d823d~3,d8241~1,d8243~1,d8245~1,d8247~3,d824d~1,d824f~1,d8251~1,d8253~1,d8255~2,d8258~1,d825c~1,d825e~2,d8262~1,d8264~1,d8266~1,d8268~2,d826e~2,d8272~1,d8274~1,d8276~8,d8280~1,d8282~1,d8284~1,d8286~1,d8288~1,d828a~1,d828c~1,d828e~1,d8290~1,d8292~1,d8294~3,d8298~1,d829a~2,d829d~1,d82a0~4,d82a6~1,d82a8~2,d82ac~1,d82ae~1,d82b0~1,d82b2~1,d82b5~1,d82b7~1,d82b9~1,d82bb~1,d82bd~1,d82bf~1,d82c1~1,d82c3~2,d82c6~2,d82c9~1,d82cb~1,d82ce~1,d82d0~2,d82d3~1,d82d6~4,d82db~1,d82de~1,d82e0~1,d82e2~1,d82e4~1,d82e6~1,d82e8~1,d82ea~1,d82ed~1,d82ef~1,d82f1~1,d82f3~2,d82f7~2,d82fb~2,d82ff~1,d8301~1,d8303~1,d8305~1,d8307~1,d8309~1,d830b~1,d830e~1,d8311~2,d8314~3,d8318~1,d831a~1,d831c~1,d831f~3,d8323~2,d8329~1,d832b~2,d832f~1,d8331~1,d8333~1,d8335~1,d8338~6,d833f~1,d8341~1,d8343~1,d8345~2,d8349~2,d834c~1,d834e~1,d8350~1,d8352~1,d8354~1,d8356~4,d835b~1,d835d~2,d8360~1,d8362~3,d8366~3,d836b~3,d8370~1,d8372~1,d8374~1,d8376~3,d837a~1,d837c~1,d837e~2,d8381~1,d8383~1,d8385~1,d8387~3,d838b~2,d838e~4,d8393~1,d8396~1,d8398~2,d839b~1,d839d~2,d83a0~2,d83a3~1,d83a5~2,d83a9~2,d83ad~1,d83b0~2,d83b4~2,d83b8~1,d83ba~a,d83c5~1,d83c7~1,d83ca~1,d83cc~1,d83ce~1,d83d0~1,d83d2~6,d83d9~3,d83df~1,d83e1~2,d83e5~1,d83e8~1,d83eb~4,d83f0~1,d83f2~1,d83f4~3,d83f8~3,d83fd~2,d8402~1,d8405~1,d8407~1,d840a~2,d840f~1,d8411~1,d8413~3,d8417~3,d841c~4,d8422~4,d8428~2,d842b~1,d842e~1,d8430~1,d8432~5,d843a~1,d843c~3,d8440~5,d8447~1,d844a~1,d844d~1,d844f~1,d8452~1,d8455~1,d8457~1,d8459~2,d845d~2,d8460~1,d8462~3,d8467~1,d8469~1,d846b~2,d846e~2,d8471~4,d8476~6,d847d~3,d8482~1,d8484~1,d8486~2,d8489~2,d848c~1,d848e~1,d8491~4,d8499~1,d849c~3,d84a0~1,d84a2~1,d84a4~3,d84aa~2,d84ad~2,d84b1~4,d84b6~1,d84b8~1,d84ba~1,d84bc~1,d84be~1,d84c0~5,d84c7~4,d84ce~1,d84d0~1,d84d2~2,d84d6~2,d84db~1,d84dd~2,d84e2~2,d84e6~1,d84e9~1,d84eb~4,d84f0~4]

pool 6 'cephfs_cephstor1_data' replicated size 3 min_size 1 crush_rule 0

object_hash rjenkins pg_num 128 pgp_num 128 last_change 1214952 flags

hashpspool stripe_width 0 application cephfs

pool 7 'cephfs_cephstor1_metadata' replicated size 3 min_size 1

crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change

1214952 flags hashpspool stripe_width 0 application cephfs

Am 11.10.2018 um 20:47 schrieb David Turner:

> My first guess is to ask what your crush rules are.  `ceph osd crush

> rule dump` along with `ceph osd pool ls detail` would be helpful.  Also

> if you have a `ceph status` output from a time where the VM RBDs aren't

> working might explain something.

> 

> On Thu, Oct 11, 2018 at 1:12 PM Nils Fahldieck - Profihost AG

> <n.fahldieck@xxxxxxxxxxxx <mailto:n.fahldieck@xxxxxxxxxxxx>> wrote:

> 

>     Hi everyone,

> 

>     since some time we experience service outages in our Ceph cluster

>     whenever there is any change to the HEALTH status. E. g. swapping

>     storage devices, adding storage devices, rebooting Ceph hosts, during

>     backfills ect.

> 

>     Just now I had a recent situation, where several VMs hung after I

>     rebooted one Ceph host. We have 3 replications for each PG, 3 mon, 3

>     mgr, 3 mds and 71 osds spread over 9 hosts.

> 

>     We use Ceph as a storage backend for our Proxmox VE (PVE) environment.

>     The outages are in the form of blocked virtual file systems of those

>     virtual machines running in our PVE cluster.

> 

>     It feels similar to stuck and inactive PGs to me. Honestly though I'm

>     not really sure on how to debug this problem or which log files to

>     examine.

> 

>     OS: Debian 9

>     Kernel: 4.12 based upon SLE15-SP1

> 

>     # ceph version

>     ceph version 12.2.8-133-gded2f6836f

>     (ded2f6836f6331a58f5c817fca7bfcd6c58795aa) luminous (stable)

> 

>     Can someone guide me? I'm more than happy to provide more information

>     as needed.

> 

>     Thanks in advance

>     Nils

>     _______________________________________________

>     ceph-users mailing list

>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>

>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com