Hello, On Thu, 18 Jun 2015 17:48:12 +0200 Jelle de Jong wrote: > Hello everybody, > > I thought I would share the benchmarks from these four ssd's I tested > (see attachment) > Neither of these are DC level SSDs of course, though the HyperX at least supposedly can handle 2.5 DWPD. Alas that info is only on the the PDF, not the web page specifications and that PDF also says "not for servers, no siree". Which can mean a lot of things, the worst would be something like going _very_ slow when doing housekeeping or the likes. > I do still have some question: > > #1 * Data Set Management TRIM supported (limit 1 block) > vs > * Data Set Management TRIM supported (limit 8 blocks) > and how this effects Ceph and also how can I test if TRIM is actually > working and not corruption data. > I would not deploy any SSDs that actually require TRIM to maintain their speed or TBW endurance. And I wouldn't want Ceph to do TRIMs due to the corruption issues you already are aware of. And last but not least, TRIM makes little to no sense with Ceph journals. These are raw partitions, so Ceph would need to issue the TRIM commands. And they are constantly being overwritten, trimming them would be detrimental to the performance for sure. > #2 are there other things I should test to compare ssd's for Ceph > Journals > TBW/$. I couldn't find the endurance data for the Plextor at all. I have a cluster with journal SSDs that experience average 2MB/s writes, so in 5 years that makes 315TB. Just shy of the 354TB the 128GB HyperX promises. First rule of engineering, overspec by at lest 100%, so the 240GB model would be a fit. If one were to use such drives in the first place. > #3 are the power loss security mechanisms on SSD relevant in Ceph when > configured in a way that a full node can fully die and that a power loss > of all nodes at the same time should not be possible (or has an extreme > low probability) > A full node death is often something you can recover from much faster than a dead OSD (usually no data loss, just reboot it) and if Ceph is configured correctly (mon_osd_down_out_subtree_limit = host) with very little impact when it comes back. If your journals are hosed because of a power loss, all the associated OSDs are dead until you either recreate the journal (if possible) or in the worst case (OSD HDD also hosed) the entire OSD. That said, I personally consider total power loss scenarios in the DCs we use to be very, very unlikely as well. Others here will strongly disagree with that, based on their experience. Penultimately that doesn't stop folks from accidentally powering off or unplugging servers. And I have seen SSDs w/o power loss protection getting hosed in such scenarios while ones with it had no issues. > #4 how to benchmarks the OSD (disk+ssd-journal) combination so I can > compare them. > There are plenty of examples in the archives, from rados bench to fio with rbd ioengine to running fio in a VM (for most people the most realistic test). Block size will have of course a dramatic impact on throughput, IOPS and CPU utilization. The fio and dd tests you did are an indication of the capabilities of those SSDs, those numbers however don't translate directly to Ceph. Also, once your SSDs are fast enough to ACK things in a timely fashion, your HDDs will become the bottleneck with persistent loads. For example in my cluster with a 2 journals per SSD (DC S3700 100GB) a fio run with 4K blocks will quickly get the CPUs sweating, the HDDs to 100% utilization and the SSDs to about 10%. However with 4M blocks the CPUs are nearly bored, the HDDs of course at about 100% and the SSD are going up to 40% (they are approaching their throughput/bandwidth limit of 200MB/s, not IOPS). With rados bench I can push the SSDs to 70%, which is one of the reasons I postulate that HDDs (of the 7.2K RPM SATA persuasion) won't be doing much over 80MB/s in the best case scenario when being used as OSDs. Regards, Christian > I got some other benchmarks question, but I will make an separate mail > for them. > > Kind regards, > > Jelle de Jong -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com