Hi Vasu,
thank you for your answer.
Yes, all the pools have min_size 1:
root@uhu2 /scripts # ceph osd lspools
0 rbd,1 cephfs_data,2 cephfs_metadata,
root@uhu2 /scripts # ceph osd pool get cephfs_data min_size
min_size: 1
root@uhu2 /scripts # ceph osd pool get cephfs_metadata min_size
min_size: 1
I stopped all the ceph services gracefully on the first machine. But,
just to get this straight: What if the first machine really suffered a
catastrophic failure? My expectation was, that the second machine just
keeps on running and serving files? This is why we are using a Cluster
in the first place... Or is already this expectation wrong?
When I stop the services on node1, I get this:
# ceph pg stat
2016-09-29 11:51:09.514814 7fcba012f700 0 -- :/1939885874 >>
136.243.82.227:6789/0 pipe(0x7fcb9c05a730 sd=3 :0 s=1 pgs=0 cs=0 l=1
c=0x7fcb9c05c3f0).fault
v41732: 264 pgs: 264 active+clean; 18514 MB data, 144 GB used, 3546 GB /
3690 GB avail; 1494 B/s rd, 0 op/s
So, my question still is: Is there a way to (preferably) automatically
avoid such a situation? Or at least manually tell the second node to
keep on working and forget about those files?
BR,
Ranjan
Am 28.09.2016 um 18:25 schrieb Vasu Kulkarni:
Are all the pools using min_size 1? did you check pg stat and see which ones
are waiting? some steps to debug further and check
http://docs.ceph.com/docs/jewel/rados/operations/monitoring-osd-pg/
Also did you shutdown the server abruptly while it was busy?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com