Thanks everyone for the replies. I will be avoiding TLC drives, it was just something easy to benchmark with existing equipment. I hadn't though of unscrupulous data durability lies or performance suddenly tanking in unpredictable ways. I guess it all comes down to trusting the vendor since it would be expensive in time and $$ to test for such things. Any thoughts on multiple Intel 35XX vs a single 36XX/37XX? All have "DC" prefixes and are listed in the Data Center section of their marketing pages, so I assume they'll all have the same quality underlying NAND. -- Adam Carheden On 04/26/2017 09:20 AM, Chris Apsey wrote: > Adam, > > Before we deployed our cluster, we did extensive testing on all kinds of > SSDs, from consumer-grade TLC SATA all the way to Enterprise PCI-E NVME > Drives. We ended up going with a ratio of 1x Intel P3608 PCI-E 1.6 TB > to 12x HGST 10TB SAS3 HDDs. It provided the best > price/performance/density balance for us overall. As a frame of > reference, we have 384 OSDs spread across 16 nodes. > > A few (anecdotal) notes: > > 1. Consumer SSDs have unpredictable performance under load; write > latency can go from normal to unusable with almost no warning. > Enterprise drives generally show much less load sensitivity. > 2. Write endurance; while it may appear that having several > consumer-grade SSDs backing a smaller number of OSDs will yield better > longevity than an enterprise grade SSD backing a larger number of OSDs, > the reality is that enterprise drives that use SLC or eMLC are generally > an order of magnitude more reliable when all is said and done. > 3. Power Loss protection (PLP). Consumer drives generally don't do well > when power is suddenly lost. Yes, we should all have UPS, etc., but > things happen. Enterprise drives are much more tolerant of > environmental failures. Recovering from misplaced objects while also > attempting to serve clients is no fun. > > > > > > --- > v/r > > Chris Apsey > bitskrieg@xxxxxxxxxxxxx > https://www.bitskrieg.net > > On 2017-04-26 10:53, Adam Carheden wrote: >> What I'm trying to get from the list is /why/ the "enterprise" drives >> are important. Performance? Reliability? Something else? >> >> The Intel was the only one I was seriously considering. The others were >> just ones I had for other purposes, so I thought I'd see how they fared >> in benchmarks. >> >> The Intel was the clear winner, but my tests did show that throughput >> tanked with more threads. Hypothetically, if I was throwing 16 OSDs at >> it, all with osd op threads = 2, do the benchmarks below not show that >> the Hynix would be a better choice (at least for performance)? >> >> Also, 4 x Intel DC S3520 costs as much as 1 x Intel DC S3610. Obviously >> the single drive leaves more bays free for OSD disks, but is there any >> other reason a single S3610 is preferable to 4 S3520s? Wouldn't 4xS3520s >> mean: >> >> a) fewer OSDs go down if the SSD fails >> >> b) better throughput (I'm speculating that the S3610 isn't 4 times >> faster than the S3520) >> >> c) load spread across 4 SATA channels (I suppose this doesn't really >> matter since the drives can't throttle the SATA bus). >> >> >> -- >> Adam Carheden >> >> On 04/26/2017 01:55 AM, Eneko Lacunza wrote: >>> Adam, >>> >>> What David said before about SSD drives is very important. I will tell >>> you another way: use enterprise grade SSD drives, not consumer grade. >>> Also, pay attention to endurance. >>> >>> The only suitable drive for Ceph I see in your tests is SSDSC2BB150G7, >>> and probably it isn't even the most suitable SATA SSD disk from Intel; >>> better use S3610 o S3710 series. >>> >>> Cheers >>> Eneko >>> >>> El 25/04/17 a las 21:02, Adam Carheden escribió: >>>> On 04/25/2017 11:57 AM, David wrote: >>>>> On 19 Apr 2017 18:01, "Adam Carheden" <carheden@xxxxxxxx >>>>> <mailto:carheden@xxxxxxxx>> wrote: >>>>> >>>>> Does anyone know if XFS uses a single thread to write to it's >>>>> journal? >>>>> >>>>> >>>>> You probably know this but just to avoid any confusion, the journal in >>>>> this context isn't the metadata journaling in XFS, it's a separate >>>>> journal written to by the OSD daemons >>>> Ha! I didn't know that. >>>> >>>>> I think the number of threads per OSD is controlled by the 'osd op >>>>> threads' setting which defaults to 2 >>>> So the ideal (for performance) CEPH cluster would be one SSD per HDD >>>> with 'osd op threads' set to whatever value fio shows as the optimal >>>> number of threads for that drive then? >>>> >>>>> I would avoid the SanDisk and Hynix. The s3500 isn't too bad. Perhaps >>>>> consider going up to a 37xx and putting more OSDs on it. Of course >>>>> with >>>>> the caveat that you'll lose more OSDs if it goes down. >>>> Why would you avoid the SanDisk and Hynix? Reliability (I think those >>>> two are both TLC)? Brand trust? If it's my benchmarks in my previous >>>> email, why not the Hynix? It's slower than the Intel, but sort of >>>> decent, at lease compared to the SanDisk. >>>> >>>> My final numbers are below, including an older Samsung Evo (MCL I >>>> think) >>>> which did horribly, though not as bad as the SanDisk. The Seagate is a >>>> 10kRPM SAS "spinny" drive I tested as a control/SSD-to-HDD comparison. >>>> >>>> SanDisk SDSSDA240G, fio 1 jobs: 7.0 MB/s (5 trials) >>>> >>>> >>>> SanDisk SDSSDA240G, fio 2 jobs: 7.6 MB/s (5 trials) >>>> >>>> >>>> SanDisk SDSSDA240G, fio 4 jobs: 7.5 MB/s (5 trials) >>>> >>>> >>>> SanDisk SDSSDA240G, fio 8 jobs: 7.6 MB/s (5 trials) >>>> >>>> >>>> SanDisk SDSSDA240G, fio 16 jobs: 7.6 MB/s (5 trials) >>>> >>>> >>>> SanDisk SDSSDA240G, fio 32 jobs: 7.6 MB/s (5 trials) >>>> >>>> >>>> SanDisk SDSSDA240G, fio 64 jobs: 7.6 MB/s (5 trials) >>>> >>>> >>>> HFS250G32TND-N1A2A 30000P10, fio 1 jobs: 4.2 MB/s (5 trials) >>>> >>>> >>>> HFS250G32TND-N1A2A 30000P10, fio 2 jobs: 0.6 MB/s (5 trials) >>>> >>>> >>>> HFS250G32TND-N1A2A 30000P10, fio 4 jobs: 7.5 MB/s (5 trials) >>>> >>>> >>>> HFS250G32TND-N1A2A 30000P10, fio 8 jobs: 17.6 MB/s (5 trials) >>>> >>>> >>>> HFS250G32TND-N1A2A 30000P10, fio 16 jobs: 32.4 MB/s (5 trials) >>>> >>>> >>>> HFS250G32TND-N1A2A 30000P10, fio 32 jobs: 64.4 MB/s (5 trials) >>>> >>>> >>>> HFS250G32TND-N1A2A 30000P10, fio 64 jobs: 71.6 MB/s (5 trials) >>>> >>>> >>>> SAMSUNG SSD, fio 1 jobs: 2.2 MB/s (5 trials) >>>> >>>> >>>> SAMSUNG SSD, fio 2 jobs: 3.9 MB/s (5 trials) >>>> >>>> >>>> SAMSUNG SSD, fio 4 jobs: 7.1 MB/s (5 trials) >>>> >>>> >>>> SAMSUNG SSD, fio 8 jobs: 12.0 MB/s (5 trials) >>>> >>>> >>>> SAMSUNG SSD, fio 16 jobs: 18.3 MB/s (5 trials) >>>> >>>> >>>> SAMSUNG SSD, fio 32 jobs: 25.4 MB/s (5 trials) >>>> >>>> >>>> SAMSUNG SSD, fio 64 jobs: 26.5 MB/s (5 trials) >>>> >>>> >>>> INTEL SSDSC2BB150G7, fio 1 jobs: 91.2 MB/s (5 trials) >>>> >>>> >>>> INTEL SSDSC2BB150G7, fio 2 jobs: 132.4 MB/s (5 trials) >>>> >>>> >>>> INTEL SSDSC2BB150G7, fio 4 jobs: 138.2 MB/s (5 trials) >>>> >>>> >>>> INTEL SSDSC2BB150G7, fio 8 jobs: 116.9 MB/s (5 trials) >>>> >>>> >>>> INTEL SSDSC2BB150G7, fio 16 jobs: 61.8 MB/s (5 trials) >>>> INTEL SSDSC2BB150G7, fio 32 jobs: 22.7 MB/s (5 trials) >>>> INTEL SSDSC2BB150G7, fio 64 jobs: 16.9 MB/s (5 trials) >>>> SEAGATE ST9300603SS, fio 1 jobs: 0.7 MB/s (5 trials) >>>> SEAGATE ST9300603SS, fio 2 jobs: 0.9 MB/s (5 trials) >>>> SEAGATE ST9300603SS, fio 4 jobs: 1.6 MB/s (5 trials) >>>> SEAGATE ST9300603SS, fio 8 jobs: 2.0 MB/s (5 trials) >>>> SEAGATE ST9300603SS, fio 16 jobs: 4.6 MB/s (5 trials) >>>> SEAGATE ST9300603SS, fio 32 jobs: 6.9 MB/s (5 trials) >>>> SEAGATE ST9300603SS, fio 64 jobs: 0.6 MB/s (5 trials) >>>> >>>> For those who come across this and are looking for drives for purposes >>>> other than CEPH, those are all sequential write numbers with caching >>>> disabled, a very CEPH-journal-specific test. The SanDisk held it's own >>>> against the Intel using some benchmarks on Windows that didn't disable >>>> caching. It may very well be a perfectly good drive for other purposes. >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> >>> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com