Hi Adam, How did you settle on the P3608 vs say the P3600 or P3700 for journals? And also the 1.6T size? Seems overkill, unless its pulling double duty beyond OSD journals. Only improvement over the P3x00 is the move from x4 lanes to x8 lanes on the PCIe bus, but the P3600/P3700 offer much more in terms of endurance, and at lower prices compared to the P3608. How big are your journal sizes, or are you over provisioning to increase endurance on the card? It would seem the new P4800X will be a perfect journaling device with >30DWPD, and even lower latency, even though it is “low” storage size, 375GB would still hold 15 25GB journals, which seems excessively large. Reed > On Apr 26, 2017, at 10:20 AM, Chris Apsey <bitskrieg@xxxxxxxxxxxxx> wrote: > > Adam, > > Before we deployed our cluster, we did extensive testing on all kinds of SSDs, from consumer-grade TLC SATA all the way to Enterprise PCI-E NVME Drives. We ended up going with a ratio of 1x Intel P3608 PCI-E 1.6 TB to 12x HGST 10TB SAS3 HDDs. It provided the best price/performance/density balance for us overall. As a frame of reference, we have 384 OSDs spread across 16 nodes. > > A few (anecdotal) notes: > > 1. Consumer SSDs have unpredictable performance under load; write latency can go from normal to unusable with almost no warning. Enterprise drives generally show much less load sensitivity. > 2. Write endurance; while it may appear that having several consumer-grade SSDs backing a smaller number of OSDs will yield better longevity than an enterprise grade SSD backing a larger number of OSDs, the reality is that enterprise drives that use SLC or eMLC are generally an order of magnitude more reliable when all is said and done. > 3. Power Loss protection (PLP). Consumer drives generally don't do well when power is suddenly lost. Yes, we should all have UPS, etc., but things happen. Enterprise drives are much more tolerant of environmental failures. Recovering from misplaced objects while also attempting to serve clients is no fun. > > > > > > --- > v/r > > Chris Apsey > bitskrieg@xxxxxxxxxxxxx > https://www.bitskrieg.net > > On 2017-04-26 10:53, Adam Carheden wrote: >> What I'm trying to get from the list is /why/ the "enterprise" drives >> are important. Performance? Reliability? Something else? >> The Intel was the only one I was seriously considering. The others were >> just ones I had for other purposes, so I thought I'd see how they fared >> in benchmarks. >> The Intel was the clear winner, but my tests did show that throughput >> tanked with more threads. Hypothetically, if I was throwing 16 OSDs at >> it, all with osd op threads = 2, do the benchmarks below not show that >> the Hynix would be a better choice (at least for performance)? >> Also, 4 x Intel DC S3520 costs as much as 1 x Intel DC S3610. Obviously >> the single drive leaves more bays free for OSD disks, but is there any >> other reason a single S3610 is preferable to 4 S3520s? Wouldn't 4xS3520s >> mean: >> a) fewer OSDs go down if the SSD fails >> b) better throughput (I'm speculating that the S3610 isn't 4 times >> faster than the S3520) >> c) load spread across 4 SATA channels (I suppose this doesn't really >> matter since the drives can't throttle the SATA bus). >> -- >> Adam Carheden >> On 04/26/2017 01:55 AM, Eneko Lacunza wrote: >>> Adam, >>> What David said before about SSD drives is very important. I will tell >>> you another way: use enterprise grade SSD drives, not consumer grade. >>> Also, pay attention to endurance. >>> The only suitable drive for Ceph I see in your tests is SSDSC2BB150G7, >>> and probably it isn't even the most suitable SATA SSD disk from Intel; >>> better use S3610 o S3710 series. >>> Cheers >>> Eneko >>> El 25/04/17 a las 21:02, Adam Carheden escribió: >>>> On 04/25/2017 11:57 AM, David wrote: >>>>> On 19 Apr 2017 18:01, "Adam Carheden" <carheden@xxxxxxxx >>>>> <mailto:carheden@xxxxxxxx>> wrote: >>>>> Does anyone know if XFS uses a single thread to write to it's >>>>> journal? >>>>> You probably know this but just to avoid any confusion, the journal in >>>>> this context isn't the metadata journaling in XFS, it's a separate >>>>> journal written to by the OSD daemons >>>> Ha! I didn't know that. >>>>> I think the number of threads per OSD is controlled by the 'osd op >>>>> threads' setting which defaults to 2 >>>> So the ideal (for performance) CEPH cluster would be one SSD per HDD >>>> with 'osd op threads' set to whatever value fio shows as the optimal >>>> number of threads for that drive then? >>>>> I would avoid the SanDisk and Hynix. The s3500 isn't too bad. Perhaps >>>>> consider going up to a 37xx and putting more OSDs on it. Of course with >>>>> the caveat that you'll lose more OSDs if it goes down. >>>> Why would you avoid the SanDisk and Hynix? Reliability (I think those >>>> two are both TLC)? Brand trust? If it's my benchmarks in my previous >>>> email, why not the Hynix? It's slower than the Intel, but sort of >>>> decent, at lease compared to the SanDisk. >>>> My final numbers are below, including an older Samsung Evo (MCL I think) >>>> which did horribly, though not as bad as the SanDisk. The Seagate is a >>>> 10kRPM SAS "spinny" drive I tested as a control/SSD-to-HDD comparison. >>>> SanDisk SDSSDA240G, fio 1 jobs: 7.0 MB/s (5 trials) >>>> SanDisk SDSSDA240G, fio 2 jobs: 7.6 MB/s (5 trials) >>>> SanDisk SDSSDA240G, fio 4 jobs: 7.5 MB/s (5 trials) >>>> SanDisk SDSSDA240G, fio 8 jobs: 7.6 MB/s (5 trials) >>>> SanDisk SDSSDA240G, fio 16 jobs: 7.6 MB/s (5 trials) >>>> SanDisk SDSSDA240G, fio 32 jobs: 7.6 MB/s (5 trials) >>>> SanDisk SDSSDA240G, fio 64 jobs: 7.6 MB/s (5 trials) >>>> HFS250G32TND-N1A2A 30000P10, fio 1 jobs: 4.2 MB/s (5 trials) >>>> HFS250G32TND-N1A2A 30000P10, fio 2 jobs: 0.6 MB/s (5 trials) >>>> HFS250G32TND-N1A2A 30000P10, fio 4 jobs: 7.5 MB/s (5 trials) >>>> HFS250G32TND-N1A2A 30000P10, fio 8 jobs: 17.6 MB/s (5 trials) >>>> HFS250G32TND-N1A2A 30000P10, fio 16 jobs: 32.4 MB/s (5 trials) >>>> HFS250G32TND-N1A2A 30000P10, fio 32 jobs: 64.4 MB/s (5 trials) >>>> HFS250G32TND-N1A2A 30000P10, fio 64 jobs: 71.6 MB/s (5 trials) >>>> SAMSUNG SSD, fio 1 jobs: 2.2 MB/s (5 trials) >>>> SAMSUNG SSD, fio 2 jobs: 3.9 MB/s (5 trials) >>>> SAMSUNG SSD, fio 4 jobs: 7.1 MB/s (5 trials) >>>> SAMSUNG SSD, fio 8 jobs: 12.0 MB/s (5 trials) >>>> SAMSUNG SSD, fio 16 jobs: 18.3 MB/s (5 trials) >>>> SAMSUNG SSD, fio 32 jobs: 25.4 MB/s (5 trials) >>>> SAMSUNG SSD, fio 64 jobs: 26.5 MB/s (5 trials) >>>> INTEL SSDSC2BB150G7, fio 1 jobs: 91.2 MB/s (5 trials) >>>> INTEL SSDSC2BB150G7, fio 2 jobs: 132.4 MB/s (5 trials) >>>> INTEL SSDSC2BB150G7, fio 4 jobs: 138.2 MB/s (5 trials) >>>> INTEL SSDSC2BB150G7, fio 8 jobs: 116.9 MB/s (5 trials) >>>> INTEL SSDSC2BB150G7, fio 16 jobs: 61.8 MB/s (5 trials) >>>> INTEL SSDSC2BB150G7, fio 32 jobs: 22.7 MB/s (5 trials) >>>> INTEL SSDSC2BB150G7, fio 64 jobs: 16.9 MB/s (5 trials) >>>> SEAGATE ST9300603SS, fio 1 jobs: 0.7 MB/s (5 trials) >>>> SEAGATE ST9300603SS, fio 2 jobs: 0.9 MB/s (5 trials) >>>> SEAGATE ST9300603SS, fio 4 jobs: 1.6 MB/s (5 trials) >>>> SEAGATE ST9300603SS, fio 8 jobs: 2.0 MB/s (5 trials) >>>> SEAGATE ST9300603SS, fio 16 jobs: 4.6 MB/s (5 trials) >>>> SEAGATE ST9300603SS, fio 32 jobs: 6.9 MB/s (5 trials) >>>> SEAGATE ST9300603SS, fio 64 jobs: 0.6 MB/s (5 trials) >>>> For those who come across this and are looking for drives for purposes >>>> other than CEPH, those are all sequential write numbers with caching >>>> disabled, a very CEPH-journal-specific test. The SanDisk held it's own >>>> against the Intel using some benchmarks on Windows that didn't disable >>>> caching. It may very well be a perfectly good drive for other purposes. >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com