> FWIW, I tried using some 256G MX100s with ceph and had horrible performance > issues within a month or two. I was seeing 100% utilization with high > latency but only 20 MB/s writes. I had a number of S3500s in the same pool > that were dramatically better. Which is to say that they were actually > faster than the hard disk pool they were fronting, rather than slower. > > If you do go with MX200s, I'd recommend only using at most 80% of the > drive; most cheap SSDs perform *much* better at sustained writes if you > give them more overprovisioning space to work with. I had planned to use at maximum 80GB of the available 250GB. 1 x 16GB OS 4 x 8, 12 or 16GB partitions for osd-journals. For a total SSD Usage of 19.2%, 25.6% or 32% and over-provisioning of 80.8%, 74.3% or 68%. I am relatively certain that those SSDs would last ages with THAT much over-provisioning. But it still is a consumer-grade SSD and it looks like those will be exchanged with Samsung 845DC Pro 400GB SSDs, if there are no known issues with those. Due to the added cost follows a reduction of nodes in the initial setup (6 Nodes with enterprise HDDs and SSDs, instead of 8 with consumer HDDs and SSDs). I would have liked more nodes, but 6 is still a good number to start. Exactly 2 times better than the lowest reasonable minimum. :) > Scott > > On Tue, Apr 28, 2015, 4:30 PM Dominik Hannen <hannen@xxxxxxxxx> wrote: > >> > It's all about the total latency per operation. Most IO sizes over 10GB >> > don't make much difference to the Round Trip Time. But comparatively even >> > 128KB IO's over 1GB take quite a while. For example ping a host with a >> > payload of 64k over 1GB and 10GB networks and look at the difference in >> > times. Now double this for Ceph (Client->Prim OSD->Sec OSD) >> > >> > When you are using SSD journals you normally end up with write latency of >> > 3-4ms over 10GB, 1GB networking will probably increase this by another >> > 2-4ms. IOPs=1000/latency >> > >> > I guess it all really depends on how important performance is >> >> I recon we are talking about single-threaded IOPs? It looks like 10ms >> latency >> is in the worst-case region.. 100 IOPs will do fine. >> >> At least in my understanding heavily multi-threaded load should be able to >> get higher IOPs regardless of latency? >> >> Some presentation material suggested that the adverse effects of higher >> latency, due to 1Gbit, begin above IO sizes of 2k, maybe there is room to >> tune IOPs hungry applications/vms accordingly. >> >> > Just had a look and the Seagate Surveillance disks spin at 7200RPM >> (missed >> > that you put that there), whereas the WD ones that I am familiar with >> spin >> > at 5400rpm, so not as bad as I thought. >> > >> > So probably ok to use, but I don't see many people using them for Ceph/ >> > generic NAS so can't be sure there's no hidden gotchas. >> >> I am not sure how trustworthy newegg-reviews are, but somehow I get some >> doubts about them now. >> I guess it does not matter that much, at least if not more than a disk a >> month >> is failing? The 3-year warranty gives some hope.. >> >> Are there some cost-efficient HDDs that someone can suggest? (Most likely >> 3TB >> drives, that seems to be the sweet-spot at the moment.) >> >> > Sorry nothing in detail, I did actually build a ceph cluster on the same >> 8 >> > core CPU as you have listed. I didn't have any performance problems but >> I do >> > remember with SSD journals when doing high queue depth writes I could get >> > the CPU quite high. It's like what I said before about the 1vs10Gb >> > networking, how important is performance, If using this CPU gives you an >> > extra 1ms of latency per OSD, is that acceptable? >> > >> > Agree 12cores (guessing 2.5Ghz each) will be an overkill for just 12 >> OSDs. I >> > have a very similar spec and see exactly the same as you, but will change >> > the nodes to 1CPU each when I expand and use the spare CPU's for the new >> > nodes. >> > >> > I'm using this:- >> > >> > http://www.supermicro.nl/products/system/4U/F617/SYS-F617H6-FTPTL_.cfm >> > >> > Mainly because of rack density, which I know doesn't apply to you. But >> the >> > fact they share PSU's/Rails/Chassis helps reduce power a bit and drives >> down >> > cost >> > >> > I can get 14 disks in each and they have 10GB on board. The SAS >> controller >> > is flashable to JBOD mode. >> > >> > Maybe one of the other Twin solutions might be suitable? >> >> I did consider that exact model (It was mentioned on the list some time >> ago) >> I could get about the same effective storage-capacity with it, but >> 10G-Networking is just too expensive on the Switch-side. >> >> Also those nodes and 10G-Switches consume a lot more power. >> >> By my estimates and numbers I found, the Avoton-Nodes should run at about >> 55W >> each. The Switches (EX3300) according to tech-specs would need 76W at max >> each. ___ Dominik _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com