> > Ceph has a default pool size of 3. Is it a bad idea to run a pool of > > size 2? What about size 2 min_size 1? > > > min_size 1 is sensible, 2 obviously won't protect you against dual disk failures. > Which happen and happen with near certainty once your cluster gets big > enough. I though I saw somewhere in the docs that there could be issues with min_size 1, but I can't seem to find it now. > > I have a cluster I'm moving data into (on RBDs) that is full enough > > with size 3 that I'm bumping into nearfull warnings. Part of that is > > because of the amount of data, part is probably because of suboptimal > > tuning (Proxmox VE doesn't support all the tuning options), and part > > is probably because of unbalanced drive distribution and multiple > > drive sizes. > > > > I'm hoping I'll be able to solve the drive size/distribution issue, > > but in the mean time, what problems could the size and min_size > > changes create (aside from the obvious issue of fewer replicas)? > > I'd address all those issues (setting the correct weight for your OSDs). > Because it is something you will need to do anyway down the road. > Alternatively add more nodes and OSDs. I don't think it's a weighting issue. My weights seem sane (e.g., they are scaled according to drive size). I think it's more an artifact arising from a combination of factors: - A relatively small number of nodes - Some of the nodes having additional OSDs - Those additional OSDs being 500GB drives compared to the other OSDs being 1TB and 3TB drives - Having to use older CRUSH tuneables - The cluster being around 72% full with that pool set to size 3 Running ' ceph osd reweight-by-utilization' clears the issue up temporarily, but additional data inevitably causes certain OSDs to be overloaded again. > While setting the replica down to 2 will "solve" your problem, it will also > create another one besides the reduced redundancy: > It will reshuffle all your data, slowing down your cluster (to the point of > becoming unresponsive if it isn't designed and configured well). > > Murphy might take those massive disk reads and writes as a clue to provide > you with a double disk failure as well. ^o^ I actually already did the size 2 change on that pool before I sent my original email. It was the only way I would get the data moved. It didn't result in any data movement, just deletion. When I get new drives I'll turn that knob back up. Thanks for your input, by the way.