Re: Sharing SSD journals and SSD drive choice

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Adam,

Before we deployed our cluster, we did extensive testing on all kinds of SSDs, from consumer-grade TLC SATA all the way to Enterprise PCI-E NVME Drives. We ended up going with a ratio of 1x Intel P3608 PCI-E 1.6 TB to 12x HGST 10TB SAS3 HDDs. It provided the best price/performance/density balance for us overall. As a frame of reference, we have 384 OSDs spread across 16 nodes.

A few (anecdotal) notes:

1. Consumer SSDs have unpredictable performance under load; write latency can go from normal to unusable with almost no warning. Enterprise drives generally show much less load sensitivity. 2. Write endurance; while it may appear that having several consumer-grade SSDs backing a smaller number of OSDs will yield better longevity than an enterprise grade SSD backing a larger number of OSDs, the reality is that enterprise drives that use SLC or eMLC are generally an order of magnitude more reliable when all is said and done. 3. Power Loss protection (PLP). Consumer drives generally don't do well when power is suddenly lost. Yes, we should all have UPS, etc., but things happen. Enterprise drives are much more tolerant of environmental failures. Recovering from misplaced objects while also attempting to serve clients is no fun.





---
v/r

Chris Apsey
bitskrieg@xxxxxxxxxxxxx
https://www.bitskrieg.net

On 2017-04-26 10:53, Adam Carheden wrote:
What I'm trying to get from the list is /why/ the "enterprise" drives
are important. Performance? Reliability? Something else?

The Intel was the only one I was seriously considering. The others were
just ones I had for other purposes, so I thought I'd see how they fared
in benchmarks.

The Intel was the clear winner, but my tests did show that throughput
tanked with more threads. Hypothetically, if I was throwing 16 OSDs at
it, all with osd op threads = 2, do the benchmarks below not show that
the Hynix would be a better choice (at least for performance)?

Also, 4 x Intel DC S3520 costs as much as 1 x Intel DC S3610. Obviously
the single drive leaves more bays free for OSD disks, but is there any
other reason a single S3610 is preferable to 4 S3520s? Wouldn't 4xS3520s
mean:

a) fewer OSDs go down if the SSD fails

b) better throughput (I'm speculating that the S3610 isn't 4 times
faster than the S3520)

c) load spread across 4 SATA channels (I suppose this doesn't really
matter since the drives can't throttle the SATA bus).


--
Adam Carheden

On 04/26/2017 01:55 AM, Eneko Lacunza wrote:
Adam,

What David said before about SSD drives is very important. I will tell
you another way: use enterprise grade SSD drives, not consumer grade.
Also, pay attention to endurance.

The only suitable drive for Ceph I see in your tests is SSDSC2BB150G7,
and probably it isn't even the most suitable SATA SSD disk from Intel;
better use S3610 o S3710 series.

Cheers
Eneko

El 25/04/17 a las 21:02, Adam Carheden escribió:
On 04/25/2017 11:57 AM, David wrote:
On 19 Apr 2017 18:01, "Adam Carheden" <carheden@xxxxxxxx
<mailto:carheden@xxxxxxxx>> wrote:

     Does anyone know if XFS uses a single thread to write to it's
journal?


You probably know this but just to avoid any confusion, the journal in
this context isn't the metadata journaling in XFS, it's a separate
journal written to by the OSD daemons
Ha! I didn't know that.

I think the number of threads per OSD is controlled by the 'osd op
threads' setting which defaults to 2
So the ideal (for performance) CEPH cluster would be one SSD per HDD
with 'osd op threads' set to whatever value fio shows as the optimal
number of threads for that drive then?

I would avoid the SanDisk and Hynix. The s3500 isn't too bad. Perhaps consider going up to a 37xx and putting more OSDs on it. Of course with
the caveat that you'll lose more OSDs if it goes down.
Why would you avoid the SanDisk and Hynix? Reliability (I think those
two are both TLC)? Brand trust? If it's my benchmarks in my previous
email, why not the Hynix? It's slower than the Intel, but sort of
decent, at lease compared to the SanDisk.

My final numbers are below, including an older Samsung Evo (MCL I think) which did horribly, though not as bad as the SanDisk. The Seagate is a 10kRPM SAS "spinny" drive I tested as a control/SSD-to-HDD comparison.

          SanDisk SDSSDA240G, fio  1 jobs:   7.0 MB/s (5 trials)


          SanDisk SDSSDA240G, fio  2 jobs:   7.6 MB/s (5 trials)


          SanDisk SDSSDA240G, fio  4 jobs:   7.5 MB/s (5 trials)


          SanDisk SDSSDA240G, fio  8 jobs:   7.6 MB/s (5 trials)


          SanDisk SDSSDA240G, fio 16 jobs:   7.6 MB/s (5 trials)


          SanDisk SDSSDA240G, fio 32 jobs:   7.6 MB/s (5 trials)


          SanDisk SDSSDA240G, fio 64 jobs:   7.6 MB/s (5 trials)


HFS250G32TND-N1A2A 30000P10, fio  1 jobs:   4.2 MB/s (5 trials)


HFS250G32TND-N1A2A 30000P10, fio  2 jobs:   0.6 MB/s (5 trials)


HFS250G32TND-N1A2A 30000P10, fio  4 jobs:   7.5 MB/s (5 trials)


HFS250G32TND-N1A2A 30000P10, fio  8 jobs:  17.6 MB/s (5 trials)


HFS250G32TND-N1A2A 30000P10, fio 16 jobs:  32.4 MB/s (5 trials)


HFS250G32TND-N1A2A 30000P10, fio 32 jobs:  64.4 MB/s (5 trials)


HFS250G32TND-N1A2A 30000P10, fio 64 jobs:  71.6 MB/s (5 trials)


                 SAMSUNG SSD, fio  1 jobs:   2.2 MB/s (5 trials)


                 SAMSUNG SSD, fio  2 jobs:   3.9 MB/s (5 trials)


                 SAMSUNG SSD, fio  4 jobs:   7.1 MB/s (5 trials)


                 SAMSUNG SSD, fio  8 jobs:  12.0 MB/s (5 trials)


                 SAMSUNG SSD, fio 16 jobs:  18.3 MB/s (5 trials)


                 SAMSUNG SSD, fio 32 jobs:  25.4 MB/s (5 trials)


                 SAMSUNG SSD, fio 64 jobs:  26.5 MB/s (5 trials)


         INTEL SSDSC2BB150G7, fio  1 jobs:  91.2 MB/s (5 trials)


         INTEL SSDSC2BB150G7, fio  2 jobs: 132.4 MB/s (5 trials)


         INTEL SSDSC2BB150G7, fio  4 jobs: 138.2 MB/s (5 trials)


         INTEL SSDSC2BB150G7, fio  8 jobs: 116.9 MB/s (5 trials)


         INTEL SSDSC2BB150G7, fio 16 jobs:  61.8 MB/s (5 trials)
         INTEL SSDSC2BB150G7, fio 32 jobs:  22.7 MB/s (5 trials)
         INTEL SSDSC2BB150G7, fio 64 jobs:  16.9 MB/s (5 trials)
         SEAGATE ST9300603SS, fio  1 jobs:   0.7 MB/s (5 trials)
         SEAGATE ST9300603SS, fio  2 jobs:   0.9 MB/s (5 trials)
         SEAGATE ST9300603SS, fio  4 jobs:   1.6 MB/s (5 trials)
         SEAGATE ST9300603SS, fio  8 jobs:   2.0 MB/s (5 trials)
         SEAGATE ST9300603SS, fio 16 jobs:   4.6 MB/s (5 trials)
         SEAGATE ST9300603SS, fio 32 jobs:   6.9 MB/s (5 trials)
         SEAGATE ST9300603SS, fio 64 jobs:   0.6 MB/s (5 trials)

For those who come across this and are looking for drives for purposes
other than CEPH, those are all sequential write numbers with caching
disabled, a very CEPH-journal-specific test. The SanDisk held it's own against the Intel using some benchmarks on Windows that didn't disable caching. It may very well be a perfectly good drive for other purposes.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux