Re: Multiple disks per server.

Sage Weil <sage@xxxxxxxxxxxx> · Tue, 4 May 2010 10:37:06 -0700 (PDT)

Hi,

On Tue, 4 May 2010, Mickaël Canévet wrote:
> As there is more then one disk per server available for data (2 with 6 disks
> and 2 with 10 disks for a total of 32 disks over 4 nodes), I was wondering how
> to define OSDs.
> 
> I have choice between one OSD per disk (32 OSDs on the cluster) or one OSD per
> server with one btrfs filesystem over all disks of the server (4 OSDs on the
> cluster). Which one is the best solution ?
> 
> In the first case, if I lose one disk, I lose only a small part of available
> space. In the other case, if I lose one disk, I lose the whole server (as
> btrfs filesystem is in stripping) much more space.
> 
> On the other hand, if I lose the whole server in the first case, I can lose
> all replicates of a data because they may be on two different OSD on the same
> server.

Right.  You can make btrfs do some replication (e.g. 2x metadata, 1x data) 
to mitigate the risk somewhat, but it'll use more disk space.  raid[56] 
in btrfs is still a ways off.

> Is there a way to define OSD groups so that we can be sure that 2 replicates
> are not on OSDs of the same group (could be usefull for multiple OSDs per
> server, but also mutliple server per computing room - if I lose one whole
> room, a lot of server, I will be sure that I have not lost every replicates).

This is probably the best route.  The process isn't streamlined, however.  
It involves constructing an appropriate CRUSH rule that includes a 
hierarchy of disks and hosts, and distributes replicas across hosts.  

I threw together a wiki article at

	http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH

Please let me know if you have questions or run into problems.

sage