CRUSH map advice

john@xxxxxxxxxxx (John Morris) · Sat, 09 Aug 2014 00:24:54 -0500

Our experimental Ceph cluster is performing terribly (with the operator 
to blame!), and while it's down to address some issues, I'm curious to 
hear advice about the following ideas.

The cluster:
- two disk nodes (6 * CPU, 16GB RAM each)
- 8 OSDs (4 each)
- 3 monitors
- 10Gb front + back networks
- 2TB Enterprise SATA drives
- HP RAID controller w/battery-backed cache
- one SSD journal drive for each two OSDs

First, I'd like to play with taking one machine down, but with the other 
node continuing to serve the cluster.  To maintain redundancy in this 
scenario, I'm thinking of setting the pool size to 4 and the min_size to 
2, with the idea that a proper CRUSH map should always keep two copies 
on each disk node.  Again, *this is for experimentation* and probably 
raises red flags for production, but I'm just asking if it's *possible*: 
  Could one node go down and the other node continue to serve r/w data? 
  Any anecdotes of performance differences between size=4 and size=3 in 
other clusters?

Second, does it make any sense to divide the CRUSH map into an extra 
level for the SSD disks, which each hold journals for two OSDs?  This 
might increase redundancy in case of a journal disk failure, but ISTR 
something about too few OSDs in a bucket causing problems with the CRUSH 
algorithm.

Thanks-

	John