Re: Sharing SSD journals and SSD drive choice

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks everyone for the replies.

I will be avoiding TLC drives, it was just something easy to benchmark
with existing equipment. I hadn't though of unscrupulous data durability
lies or performance suddenly tanking in unpredictable ways. I guess it
all comes down to trusting the vendor since it would be expensive in
time and $$ to test for such things.

Any thoughts on multiple Intel 35XX vs a single 36XX/37XX? All have "DC"
prefixes and are listed in the Data Center section of their marketing
pages, so I assume they'll all have the same quality underlying NAND.

-- 
Adam Carheden


On 04/26/2017 09:20 AM, Chris Apsey wrote:
> Adam,
> 
> Before we deployed our cluster, we did extensive testing on all kinds of
> SSDs, from consumer-grade TLC SATA all the way to Enterprise PCI-E NVME
> Drives.  We ended up going with a ratio of 1x Intel P3608 PCI-E 1.6 TB
> to 12x HGST 10TB SAS3 HDDs.  It provided the best
> price/performance/density balance for us overall.  As a frame of
> reference, we have 384 OSDs spread across 16 nodes.
> 
> A few (anecdotal) notes:
> 
> 1. Consumer SSDs have unpredictable performance under load; write
> latency can go from normal to unusable with almost no warning. 
> Enterprise drives generally show much less load sensitivity.
> 2. Write endurance; while it may appear that having several
> consumer-grade SSDs backing a smaller number of OSDs will yield better
> longevity than an enterprise grade SSD backing a larger number of OSDs,
> the reality is that enterprise drives that use SLC or eMLC are generally
> an order of magnitude more reliable when all is said and done.
> 3. Power Loss protection (PLP).  Consumer drives generally don't do well
> when power is suddenly lost.  Yes, we should all have UPS, etc., but
> things happen.  Enterprise drives are much more tolerant of
> environmental failures.  Recovering from misplaced objects while also
> attempting to serve clients is no fun.
> 
> 
> 
> 
> 
> ---
> v/r
> 
> Chris Apsey
> bitskrieg@xxxxxxxxxxxxx
> https://www.bitskrieg.net
> 
> On 2017-04-26 10:53, Adam Carheden wrote:
>> What I'm trying to get from the list is /why/ the "enterprise" drives
>> are important. Performance? Reliability? Something else?
>>
>> The Intel was the only one I was seriously considering. The others were
>> just ones I had for other purposes, so I thought I'd see how they fared
>> in benchmarks.
>>
>> The Intel was the clear winner, but my tests did show that throughput
>> tanked with more threads. Hypothetically, if I was throwing 16 OSDs at
>> it, all with osd op threads = 2, do the benchmarks below not show that
>> the Hynix would be a better choice (at least for performance)?
>>
>> Also, 4 x Intel DC S3520 costs as much as 1 x Intel DC S3610. Obviously
>> the single drive leaves more bays free for OSD disks, but is there any
>> other reason a single S3610 is preferable to 4 S3520s? Wouldn't 4xS3520s
>> mean:
>>
>> a) fewer OSDs go down if the SSD fails
>>
>> b) better throughput (I'm speculating that the S3610 isn't 4 times
>> faster than the S3520)
>>
>> c) load spread across 4 SATA channels (I suppose this doesn't really
>> matter since the drives can't throttle the SATA bus).
>>
>>
>> -- 
>> Adam Carheden
>>
>> On 04/26/2017 01:55 AM, Eneko Lacunza wrote:
>>> Adam,
>>>
>>> What David said before about SSD drives is very important. I will tell
>>> you another way: use enterprise grade SSD drives, not consumer grade.
>>> Also, pay attention to endurance.
>>>
>>> The only suitable drive for Ceph I see in your tests is SSDSC2BB150G7,
>>> and probably it isn't even the most suitable SATA SSD disk from Intel;
>>> better use S3610 o S3710 series.
>>>
>>> Cheers
>>> Eneko
>>>
>>> El 25/04/17 a las 21:02, Adam Carheden escribió:
>>>> On 04/25/2017 11:57 AM, David wrote:
>>>>> On 19 Apr 2017 18:01, "Adam Carheden" <carheden@xxxxxxxx
>>>>> <mailto:carheden@xxxxxxxx>> wrote:
>>>>>
>>>>>      Does anyone know if XFS uses a single thread to write to it's
>>>>> journal?
>>>>>
>>>>>
>>>>> You probably know this but just to avoid any confusion, the journal in
>>>>> this context isn't the metadata journaling in XFS, it's a separate
>>>>> journal written to by the OSD daemons
>>>> Ha! I didn't know that.
>>>>
>>>>> I think the number of threads per OSD is controlled by the 'osd op
>>>>> threads' setting which defaults to 2
>>>> So the ideal (for performance) CEPH cluster would be one SSD per HDD
>>>> with 'osd op threads' set to whatever value fio shows as the optimal
>>>> number of threads for that drive then?
>>>>
>>>>> I would avoid the SanDisk and Hynix. The s3500 isn't too bad. Perhaps
>>>>> consider going up to a 37xx and putting more OSDs on it. Of course
>>>>> with
>>>>> the caveat that you'll lose more OSDs if it goes down.
>>>> Why would you avoid the SanDisk and Hynix? Reliability (I think those
>>>> two are both TLC)? Brand trust? If it's my benchmarks in my previous
>>>> email, why not the Hynix? It's slower than the Intel, but sort of
>>>> decent, at lease compared to the SanDisk.
>>>>
>>>> My final numbers are below, including an older Samsung Evo (MCL I
>>>> think)
>>>> which did horribly, though not as bad as the SanDisk. The Seagate is a
>>>> 10kRPM SAS "spinny" drive I tested as a control/SSD-to-HDD comparison.
>>>>
>>>>           SanDisk SDSSDA240G, fio  1 jobs:   7.0 MB/s (5 trials)
>>>>
>>>>
>>>>           SanDisk SDSSDA240G, fio  2 jobs:   7.6 MB/s (5 trials)
>>>>
>>>>
>>>>           SanDisk SDSSDA240G, fio  4 jobs:   7.5 MB/s (5 trials)
>>>>
>>>>
>>>>           SanDisk SDSSDA240G, fio  8 jobs:   7.6 MB/s (5 trials)
>>>>
>>>>
>>>>           SanDisk SDSSDA240G, fio 16 jobs:   7.6 MB/s (5 trials)
>>>>
>>>>
>>>>           SanDisk SDSSDA240G, fio 32 jobs:   7.6 MB/s (5 trials)
>>>>
>>>>
>>>>           SanDisk SDSSDA240G, fio 64 jobs:   7.6 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio  1 jobs:   4.2 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio  2 jobs:   0.6 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio  4 jobs:   7.5 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio  8 jobs:  17.6 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio 16 jobs:  32.4 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio 32 jobs:  64.4 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio 64 jobs:  71.6 MB/s (5 trials)
>>>>
>>>>
>>>>                  SAMSUNG SSD, fio  1 jobs:   2.2 MB/s (5 trials)
>>>>
>>>>
>>>>                  SAMSUNG SSD, fio  2 jobs:   3.9 MB/s (5 trials)
>>>>
>>>>
>>>>                  SAMSUNG SSD, fio  4 jobs:   7.1 MB/s (5 trials)
>>>>
>>>>
>>>>                  SAMSUNG SSD, fio  8 jobs:  12.0 MB/s (5 trials)
>>>>
>>>>
>>>>                  SAMSUNG SSD, fio 16 jobs:  18.3 MB/s (5 trials)
>>>>
>>>>
>>>>                  SAMSUNG SSD, fio 32 jobs:  25.4 MB/s (5 trials)
>>>>
>>>>
>>>>                  SAMSUNG SSD, fio 64 jobs:  26.5 MB/s (5 trials)
>>>>
>>>>
>>>>          INTEL SSDSC2BB150G7, fio  1 jobs:  91.2 MB/s (5 trials)
>>>>
>>>>
>>>>          INTEL SSDSC2BB150G7, fio  2 jobs: 132.4 MB/s (5 trials)
>>>>
>>>>
>>>>          INTEL SSDSC2BB150G7, fio  4 jobs: 138.2 MB/s (5 trials)
>>>>
>>>>
>>>>          INTEL SSDSC2BB150G7, fio  8 jobs: 116.9 MB/s (5 trials)
>>>>
>>>>
>>>>          INTEL SSDSC2BB150G7, fio 16 jobs:  61.8 MB/s (5 trials)
>>>>          INTEL SSDSC2BB150G7, fio 32 jobs:  22.7 MB/s (5 trials)
>>>>          INTEL SSDSC2BB150G7, fio 64 jobs:  16.9 MB/s (5 trials)
>>>>          SEAGATE ST9300603SS, fio  1 jobs:   0.7 MB/s (5 trials)
>>>>          SEAGATE ST9300603SS, fio  2 jobs:   0.9 MB/s (5 trials)
>>>>          SEAGATE ST9300603SS, fio  4 jobs:   1.6 MB/s (5 trials)
>>>>          SEAGATE ST9300603SS, fio  8 jobs:   2.0 MB/s (5 trials)
>>>>          SEAGATE ST9300603SS, fio 16 jobs:   4.6 MB/s (5 trials)
>>>>          SEAGATE ST9300603SS, fio 32 jobs:   6.9 MB/s (5 trials)
>>>>          SEAGATE ST9300603SS, fio 64 jobs:   0.6 MB/s (5 trials)
>>>>
>>>> For those who come across this and are looking for drives for purposes
>>>> other than CEPH, those are all sequential write numbers with caching
>>>> disabled, a very CEPH-journal-specific test. The SanDisk held it's own
>>>> against the Intel using some benchmarks on Windows that didn't disable
>>>> caching. It may very well be a perfectly good drive for other purposes.
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux