M=1 is never a good choice. Just use replication instead. > On Jun 26, 2020, at 3:05 AM, Zhenshi Zhou <deaderzzs@xxxxxxxxx> wrote: > > Hi Janne, > > I use the default profile(2+1) and set failure-domain=host, is my best > practice? > > Janne Johansson <icepic.dz@xxxxxxxxx> 于2020年6月26日周五 下午4:59写道: > >> Den fre 26 juni 2020 kl 10:32 skrev Zhenshi Zhou <deaderzzs@xxxxxxxxx>: >> >>> Hi all, >>> >>> I'm going to deploy a cluster with erasure code pool for cold storage. >>> There are 3 servers for me to set up the cluster, 12 OSDs on each server. >>> Does that mean the data is secure while 1/3 OSDs of the cluster is down, >>> or only 2 of the OSDs is down , if I set the ec profile with k=4 and m=2. >>> >> >> By default, crush will want to place each part (of 6 in your case for EC >> 4+2) >> on a host of its own, to maximize data safety. Since you can't do that >> with 3 >> hosts, you must make sure no more than 2 pieces end up on a single host >> ever, >> so you can't just move from failure-domain=host to domain=osd, since that >> would place all 6 pieces on the same host but different OSDs which would be >> bad. >> >> You need to make the crush rule pick two different OSDs per host, but not >> more. >> One way could be to make a tree where hosts has half of its OSDs in one >> branch >> and the other half in another (lets call it subhost in this example), then >> you get 3*2 >> subhosts, and you make crush pick placement from subhosts and it will >> always put >> two pieces per OSD host, never on the same OSD and it will allow one host >> to be >> down for a while. >> >> I would like to add that data is not very secure when you have no >> redundancy at all >> left. Machines will crash, they will require maintenance, patches, bios >> updates and >> things like that, and having NO redundancy while you have planned or >> unplanned >> downtime will be placing the data at huge risk, _any_ surprise in this >> situation would >> immediately lead to data loss. >> >> Also, if one box dies, the cluster can't run and can't recover until you >> have a new >> host back in, so you are already running at the edge of data safety in >> your normal case. >> Even if this will "work", ceph as being a cluster really should have N+1 >> hosts or more >> if your data split (replication factor or EC k+m) is equal to N. >> >> -- >> May the most significant bit of your life be positive. >> > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx