Hi Christian, On 04/02/15 02:39, "Christian Balzer" <chibi@xxxxxxx> wrote: >On Tue, 3 Feb 2015 15:16:57 +0000 Colombo Marco wrote: > >> Hi all, >> I have to build a new Ceph storage cluster, after i‘ve read the >> hardware recommendations and some mail from this mailing list i would >> like to buy these servers: >> > >Nick mentioned a number of things already I totally agree with, so don't >be surprised if some of this feels like a repeat. > >> OSD: >> SSG-6027R-E1R12L -> >> http://www.supermicro.nl/products/system/2U/6027/SSG-6027R-E1R12L.cfm >> Intel Xeon e5-2630 v2 64 GB RAM >As nick said, v3 and more RAM might be helpful, depending on your use case >(small writes versus large ones) even faster CPUs as well. Ok, we switch from v2 to v3 and from 64 to 96 GB of RAM. > >> LSI 2308 IT >> 2 x SSD Intel DC S3700 400GB >> 2 x SSD Intel DC S3700 200GB >Why the separation of SSDs? >They aren't going to be that busy with regards to the OS. We would like to use 400GB SSD for a cache pool, and 200GB SSD for the journaling. > >Get a case like Nick mentioned with 2 2.5 bays in the back, put 2 DC S3700 >400GBs in there (connected to onboard 6Gb/s SATA3), partition them so that >you have a RAID1 for OS and plain partitions for the journals of the now >12 >OSD HDDs in your chassis. >Of course this optimization in terms of cost and density comes with a >price, if one SSD should fail, you will have 6 OSDs down. >Given how reliable the Intels are this is unlikely, but something you need >to consider. > >If you want to limit the impact of a SSD failure and have just 2 OSD >journals per SSD, get a chassis like the one above and 4 DC S3700 200GB, >RAID10 them for the OS and put 2 journal partitions on each. > >I did the same with 8 3TB HDDs and 4 DC S3700 100GB, the HDDs (and CPU >with 4KB IOPS), are the limiting factor, not the SSDs. > >> 8 x HDD Seagate Enterprise 6TB >Are you really sure you need that density? One disk failure will result in >a LOT of data movement once these become somewhat full. >If you were to go for a 12 OSD node as described above, consider 4TB ones >for the same overall density, while having more IOPS and likely the same >price or less. We choosen the 6TB of disk, because we need a lot of storage in a small amount of server and we prefer server with not too much disks. However we plan to use max 80% of a 6TB Disk > >> 2 x 40GbE for backend network >You'd be lucky to write more that 800MB/s sustained to your 8 HDDs >(remember they will have to deal with competing reads and writes, this is >not a sequential synthetic write benchmark). >Incidentally 1GB/s to 1.2GB/s (depending on configuration) would also be >the limit of your journal SSDs. >Other than backfilling caused by cluster changes (OSD removed/added), your >limitation is nearly always going to be IOPS, not bandwidth. Ok, after some discussion, we switch to 2 x 10 GbE. > >So 2x10GbE or if you're comfortable with it (I am ^o^) an Infiniband >backend (can be cheaper, less latency, plans for RDMA support in >Ceph) should be more than sufficient. > >> 2 x 10GbE for public network >> >> META/MON: >> >> SYS-6017R-72RFTP -> >> http://www.supermicro.com/products/system/1U/6017/SYS-6017R-72RFTP.cfm 2 >> x Intel Xeon e5-2637 v2 4 x SSD Intel DC S3500 240GB raid 1+0 >You're likely to get better performance and of course MUCH better >durability by using 2 DC S3700, at about the same price. Ok we switch to 2 x SSD DC S3700 > >> 128 GB RAM >Total overkill for a MON, but I have no idea about MDS and RAM never >hurts. Ok we switch from 128 to 96 > >In your follow-up you mentioned 3 mons, I would suggest putting 2 more >mons (only, not MDS) on OSD nodes and make sure that within the IP >numbering the "real" mons have the lowest IP addresses, because the MON >with the lowest IP becomes master (and thus the busiest). >This way you can survive a loss of 2 nodes and still have a valid quorum. Ok, got it > >Christian > >> 2 x 10 GbE >> >> What do you think? >> Any feedbacks, advices, or ideas are welcome! >> >> Thanks so much >> >> Regards, > > >-- >Christian Balzer Network/Systems Engineer >chibi@xxxxxxx Global OnLine Japan/Fusion Communications >http://www.gol.com/ Thanks so much! > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com