Hi, On Tue, 4 May 2010, Mickaël Canévet wrote: > As there is more then one disk per server available for data (2 with 6 disks > and 2 with 10 disks for a total of 32 disks over 4 nodes), I was wondering how > to define OSDs. > > I have choice between one OSD per disk (32 OSDs on the cluster) or one OSD per > server with one btrfs filesystem over all disks of the server (4 OSDs on the > cluster). Which one is the best solution ? > > In the first case, if I lose one disk, I lose only a small part of available > space. In the other case, if I lose one disk, I lose the whole server (as > btrfs filesystem is in stripping) much more space. > > On the other hand, if I lose the whole server in the first case, I can lose > all replicates of a data because they may be on two different OSD on the same > server. Right. You can make btrfs do some replication (e.g. 2x metadata, 1x data) to mitigate the risk somewhat, but it'll use more disk space. raid[56] in btrfs is still a ways off. > Is there a way to define OSD groups so that we can be sure that 2 replicates > are not on OSDs of the same group (could be usefull for multiple OSDs per > server, but also mutliple server per computing room - if I lose one whole > room, a lot of server, I will be sure that I have not lost every replicates). This is probably the best route. The process isn't streamlined, however. It involves constructing an appropriate CRUSH rule that includes a hierarchy of disks and hosts, and distributes replicas across hosts. I threw together a wiki article at http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH Please let me know if you have questions or run into problems. sage