Re: I/O hangs when one of three nodes is down

Grigori Frolov <gfrolov@xxxxxxxxx> · Thu, 7 Jun 2018 13:40:38 +0000

root@testk8s1:~# ceph osd pool ls detail
pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
pool 1 'cephfs_data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 12 flags hashpspool crash_replay_interval 45 stripe_width 0
pool 2 'cephfs_metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11 flags hashpspool stripe_width 0

I haven't changed any crush rule. Here's the dump:

root@testk8s1:~# ceph osd crush rule dump 
[
    {
        "rule_id": 0,
        "rule_name": "replicated_ruleset",
        "ruleset": 0,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -1,
                "item_name": "default"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    }
]

kind regards, Grigori

От: Paul Emmerich <paul.emmerich@xxxxxxxx>

Отправлено: 7 июня 2018 г. 18:26

Кому: Grigori Frolov

Копия: ceph-users@xxxxxxxxxxxxxx

Тема: Re:  I/O hangs when one of three nodes is down

can you post your pool configuration?

 ceph osd pool ls detail

and the crush rule if you modified it.

Paul

2018-06-07 14:52 GMT+02:00 Фролов Григорий 
<gfrolov@xxxxxxxxx>:

Hello. Could you please help me troubleshoot the issue.

I have 3 nodes in a cluster.

ID WEIGHT  TYPE NAME         UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 0.02637 root default                                        
-2 0.00879     host testk8s3                                   
 0 0.00879         osd.0          up  1.00000          1.00000 
-3 0.00879     host testk8s1                                   
 1 0.00879         osd.1        down        0          1.00000 
-4 0.00879     host testk8s2                                   
 2 0.00879         osd.2          up  1.00000          1.00000 

Each node runs ceph-osd, ceph-mon and ceph-mds.
So when all nodes are up, everything is fine.

When any of 3 nodes goes down, no matter if it shuts down gracefully or in a hard way, remaining nodes cannot read or write to the catalog where ceph storage is mounted. They also cannot unmount the volume. Every process touching
 the catalog just hangs forever, going into uninterruptible sleep. When I try to strace that process, strace hangs too. When the failed node goes up, each hung process finishes successfully.

So what could cause the issue?

root@testk8s2:~# ps -eo pid,stat,cmd | grep ls
 3700 D    ls --color=auto /mnt/db
 3997 S+   grep --color=auto ls

root@testk8s2:~# strace -p 3700&
[1] 4020
root@testk8s2:~# strace: Process 3700 attached

root@testk8s2:~# ps -eo pid,stat,cmd | grep strace

 4020 S    strace -p 3700

root@testk8s2:~# umount /mnt&
[2] 4084
root@testk8s2:~# ps -eo pid,state,cmd | grep umount
 4084 D umount /mnt

root@testk8s2:~# ceph -v

ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
root@testk8s2:~# ceph -s
    cluster 0bcc00ec-731a-4734-8d76-599f70f06209
     health HEALTH_ERR
            80 pgs degraded
            80 pgs stuck degraded
            80 pgs stuck unclean
            80 pgs stuck undersized
            80 pgs undersized
            recovery 1075/3225 objects degraded (33.333%)
            mds rank 2 has failed
            mds cluster is degraded
            1 mons down, quorum 1,2 testk8s2,testk8s3
     monmap e1: 3 mons at {testk8s1=10.105.6.116:6789/0,testk8s2=10.105.6.117:6789/0,testk8s3=10.105.6.118:6789/0}
            election epoch 120, quorum 1,2 testk8s2,testk8s3
      fsmap e14084: 2/3/3 up {0=testk8s2=up:active,1=testk8s3=up:active}, 1 failed
     osdmap e9939: 3 osds: 2 up, 2 in; 80 remapped pgs
            flags sortbitwise,require_jewel_osds
      pgmap v17491: 80 pgs, 3 pools, 194 MB data, 1075 objects
            1530 MB used, 16878 MB / 18408 MB avail
            1075/3225 objects degraded (33.333%)
                  80 active+undersized+degraded

Thanks.

kind regards, Grigori

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 

Paul Emmerich

Looking for help with your Ceph cluster? Contact us at 
https://croit.io

croit GmbH

Freseniusstr. 31h

81247 München

www.croit.io

Tel: +49 89 1896585 90

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com