Hello, lots of similar questions in the past, google is your friend. On Mon, 5 Jun 2017 23:59:07 -0400 Daniel K wrote: > I've built 'my-first-ceph-cluster' with two of the 4-node, 12 drive > Supermicro servers and dual 10Gb interfaces(one cluster, one public) > > I now have 9x 36-drive supermicro StorageServers made available to me, each > with dual 10GB and a single Mellanox IB/40G nic. No 1G interfaces except > IPMI. 2x 6-core 6-thread 1.7ghz xeon processors (12 cores total) for 36 > drives. Currently 32GB of ram. 36x 1TB 7.2k drives. > I love using IB, alas with just one port per host you're likely best off ignoring it, unless you have a converged network/switches that can make use of it (or run it in Ethernet mode). > Early usage will be CephFS, exported via NFS and mounted on ESXi 5.5 and > 6.0 hosts(migrating from a VMWare environment), later to transition to > qemu/kvm/libvirt using native RBD mapping. I tested iscsi using lio and saw > much worse performance with the first cluster, so it seems this may be the > better way, but I'm open to other suggestions. > I've never seen any ultimate solution to providing HA iSCSI on top of Ceph, though other people here have made significant efforts. > Considerations: > Best practice documents indicate .5 cpu per OSD, but I have 36 drives and > 12 CPUs. Would it be better to create 18x 2-drive raid0 on the hardware > raid card to present a fewer number of larger devices to ceph? Or run > multiple drives per OSD? > You're definitely underpowered in the CPU department and I personally would make RAID1 or 10s for never having to re-balance an OSD. But if space is an issue, RAID0s would do. OTOH, w/o any SSDs in the game your HDD only cluster is going to be less CPU hungry than others. > There is a single 256gb SSD which i feel would be a bottleneck if I used it > as a journal for all 36 drives, so I believe bluestore with a journal on > each drive would be the best option. > Bluestore doesn't have journals per se and unless you're going to wait for Luminous I wouldn't recommend using Bluestore in production. Hell, I won't be using it any time soon, but anything pre L sounds like outright channeling Murphy to smite you. That said, what SSD is it? Bluestore WAL needs are rather small. OTOH, a single SSD isn't something I'd recommend either, SPOF and all. I'm guessing you have no budget to improve on that gift horse? > Is 1.7Ghz too slow for what I'm doing? > If you're going to have a lot of small I/Os it probably will be. > I like the idea of keeping the public and cluster networks separate. I don't, at least not on a physical level when you pay for this by loosing redundancy. Do you have 2 switches, are they MC-LAG capable (aka stackable)? >Any > suggestions on which interfaces to use for what? I could theoretically push > 36Gb/s, figuring 125MB/s for each drive, but in reality will I ever see > that? Not by a long shot, even with Bluestore. With the WAL and other bits on SSD and very kind write patterns, maybe 100MB/s per drive, but IIRC there were issues with current Bluestore and performance as well. >Perhaps bond the two 10GB and use them as the public, and the 40gb as > the cluster network? Or split the 40gb in to 4x10gb and use 3x10GB bonded > for each? > If you can actually split it up, see above, mc-LAG. That will give you 60Gb/s, half that if a switch fails and if it makes you fell better, do the cluster and public with VLANs. But that will cost you in not so cheap switch ports, of course. Christian > If there is a more appropriate venue for my request, please point me in > that direction. > > Thanks, > Dan -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Rakuten Communications _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com