can you post your pool configuration?
ceph osd pool ls detail
and the crush rule if you modified it.
Paul
2018-06-07 14:52 GMT+02:00 Фролов Григорий <gfrolov@xxxxxxxxx>:
Hello. Could you please help me troubleshoot the issue.
I have 3 nodes in a cluster.
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY-1 0.02637 root default-2 0.00879 host testk8s30 0.00879 osd.0 up 1.00000 1.00000-3 0.00879 host testk8s11 0.00879 osd.1 down 0 1.00000-4 0.00879 host testk8s22 0.00879 osd.2 up 1.00000 1.00000Each node runs ceph-osd, ceph-mon and ceph-mds.
So when all nodes are up, everything is fine.
When any of 3 nodes goes down, no matter if it shuts down gracefully or in a hard way, remaining nodes cannot read or write to the catalog where ceph storage is mounted. They also cannot unmount the volume. Every process touching the catalog just hangs forever, going into uninterruptible sleep. When I try to strace that process, strace hangs too. When the failed node goes up, each hung process finishes successfully.
So what could cause the issue?
root@testk8s2:~# ps -eo pid,stat,cmd | grep ls3700 D ls --color=auto /mnt/db3997 S+ grep --color=auto ls
root@testk8s2:~# strace -p 3700&[1] 4020root@testk8s2:~# strace: Process 3700 attached
root@testk8s2:~# ps -eo pid,stat,cmd | grep strace
4020 S strace -p 3700
root@testk8s2:~# umount /mnt&[2] 4084root@testk8s2:~# ps -eo pid,state,cmd | grep umount4084 D umount /mnt
root@testk8s2:~# ceph -v
ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700 e4654fdbbe) root@testk8s2:~# ceph -scluster 0bcc00ec-731a-4734-8d76-599f70f06209 health HEALTH_ERR80 pgs degraded80 pgs stuck degraded80 pgs stuck unclean80 pgs stuck undersized80 pgs undersizedrecovery 1075/3225 objects degraded (33.333%)mds rank 2 has failedmds cluster is degraded1 mons down, quorum 1,2 testk8s2,testk8s3monmap e1: 3 mons at {testk8s1=10.105.6.116:6789/0,testk8s2=10.105.6.117:6789/0, }testk8s3=10.105.6.118:6789/0 election epoch 120, quorum 1,2 testk8s2,testk8s3fsmap e14084: 2/3/3 up {0=testk8s2=up:active,1=testk8s3=up:active}, 1 failed osdmap e9939: 3 osds: 2 up, 2 in; 80 remapped pgsflags sortbitwise,require_jewel_osdspgmap v17491: 80 pgs, 3 pools, 194 MB data, 1075 objects1530 MB used, 16878 MB / 18408 MB avail1075/3225 objects degraded (33.333%)80 active+undersized+degradedThanks.
kind regards, Grigori
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com