Hi, On Sat, 2011-07-02 at 15:30 +0200, Wilfrid Allembrand wrote: > Hi everyone, > > I'm trying to figure out what is the best OSD solution with an > infrastructure made up of servers with a lot a disks in each. Say, for > exemple you have 4+ nodes like Sun Fire X4500 (code-named Thumper). > Each node is 48 disks. > > What are the pros and cons to build a ceph cluster with btrfs on that > kind of high density hardware, considering the different scenarios for > each server : > - 1 OSD daemon per disk, so 48 osd daemons per server That would be the best option in terms of available storage, you would get the maximum available storage, but you would need a LOT of RAM and CPU power. I'm running 10 nodes with 4 OSD's each on Atoms with 4GB of RAM, that is pretty heavy for those machines, especially when you start to have a lot of PGs (Placement Groups) and objects. Recovery then start to take a lot of time and memory. > - make 3 btrfs pools of 16 disks, so 3 osd daemons per server > - make 3 raid 5 or 6 volumes, so 3 osd daemons per server You could try making btrfs pools of 12 or 16 disks, whatever you like, but you would then add a SPoF, if for some reason btrfs fails (bugs or so) you could loose a lot of data, recovering that could saturate the rest of your cluster. Using software RAID is a second option, but still, adding even an extra layer? Running less OSD's would mean less memory overhead, but if it really matters? I'm not sure. The more data and PGs you start to add, the more it will start to stress your OSDs. The number of PGs is influenced by the number of OSDs, so running less OSDs means less PGs, but how much of a difference it makes? Not sure. > > From a performance and management point of view, will you recommand a > lot of small servers of a few numbers of thumper's like servers ? >From what I know, get a lot of small machines with lets say 4 to 8 disks. If one fails the impact on the cluster will be much smaller and recovery will take less time. Think about it, you have 3 "thumpers" with each 48TB of storage and one fails, that is going to be a heavy recovery. Wido > > All the best, > Wilfrid > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html