Hi,
basically, with EC pools you usually have a min_size of k + 1 to
prevent data loss. There was a thread about that just a few days ago
on this list. So in your case your min_size is probably 9, which makes
IO pause in case two chunks become unavailable. If your crush failure
domain is host (seems like it is) and you have "only" 10 hosts I'd
recommend to add a host if possible to be able to fully recover while
one host is down. Otherwise the PGs stay degraded until the host comes
back.
So in your case your cluster can handle only one down host, e. g. for
maintenance. If another host goes down (disk, network, whatever) you
hit the min_size limit. Temporarily, you can set min_size = k but you
should not risk anything and increase back to k + 1 after successful
recovery. It's not possible to change the EC profile of a pool, you'd
have to create a new pool and copy the data.
Check out the EC docs [1] to have some more details.
Regards,
Eugen
[1]
https://docs.ceph.com/en/quincy/rados/operations/erasure-code/?highlight=k%2B1#erasure-coded-pool-recovery
Zitat von Nguyễn Hữu Khôi <nguyenhuukhoinw@xxxxxxxxx>:
Hello guys.
I see many docs and threads talking about osd failed. I have a question:
how many nodes in a cluster can be failed.
I am using ec 8 + 2(10 osd nodes) and when I shutdown 2 nodes then my
cluster crashes, It cannot write anymore.
Thank you. Regards
Nguyen Huu Khoi
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx