Re: Sharing SSD journals and SSD drive choice

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Adam,


2017-04-26 20:54 GMT+05:00 Adam Carheden <carheden@xxxxxxxx>:
Thanks everyone for the replies.

Any thoughts on multiple Intel 35XX vs a single 36XX/37XX? All have "DC"
prefixes and are listed in the Data Center section of their marketing
pages, so I assume they'll all have the same quality underlying NAND.
I would recommend you to avoid S3510 because of terribly low endurance (0.3 DWPD) and pretty much slower sync writes than S3610. You can look at Samsung SM863 as an alternative. I've got very promising test results with 1.92TB version ( https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/#comment-3273882789 ).

Now I've done a quick test for 120GB version, but it installed in a live (but not busy at all) system, so results can be a little bit lower than real.

hdparm -W 0 /dev/sda
fio --filename=/dev/sda  --direct=1 --sync=1 --rw=write --bs=4k --numjobs=NUM_JOBS --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test

NUM_JOBS = 1:
  write: io=1889.9MB, bw=32253KB/s, iops=8063, runt= 60001msec
    clat (usec): min=104, max=8285, avg=122.94, stdev=29.72

NUM_JOBS = 5:
  write: io=4439.4MB, bw=75764KB/s, iops=18940, runt= 60001msec
    clat (usec): min=119, max=4906, avg=262.54, stdev=78.00

NUM_JOBS = 10:
  write: io=5288.6MB, bw=90256KB/s, iops=22563, runt= 60001msec
    clat (usec): min=120, max=7141, avg=441.03, stdev=94.81

Though even these results are good enough, they are much worse than 1.92TB model.

With than in mind I would not recommend to use drives with less than 240 or even 400-480GB space even if it's an overkill for journal: less capacity devices are slower than the bigger ones.

Best regards,
Vladimir
 

--
Adam Carheden


On 04/26/2017 09:20 AM, Chris Apsey wrote:
> Adam,
>
> Before we deployed our cluster, we did extensive testing on all kinds of
> SSDs, from consumer-grade TLC SATA all the way to Enterprise PCI-E NVME
> Drives.  We ended up going with a ratio of 1x Intel P3608 PCI-E 1.6 TB
> to 12x HGST 10TB SAS3 HDDs.  It provided the best
> price/performance/density balance for us overall.  As a frame of
> reference, we have 384 OSDs spread across 16 nodes.
>
> A few (anecdotal) notes:
>
> 1. Consumer SSDs have unpredictable performance under load; write
> latency can go from normal to unusable with almost no warning.
> Enterprise drives generally show much less load sensitivity.
> 2. Write endurance; while it may appear that having several
> consumer-grade SSDs backing a smaller number of OSDs will yield better
> longevity than an enterprise grade SSD backing a larger number of OSDs,
> the reality is that enterprise drives that use SLC or eMLC are generally
> an order of magnitude more reliable when all is said and done.
> 3. Power Loss protection (PLP).  Consumer drives generally don't do well
> when power is suddenly lost.  Yes, we should all have UPS, etc., but
> things happen.  Enterprise drives are much more tolerant of
> environmental failures.  Recovering from misplaced objects while also
> attempting to serve clients is no fun.
>
>
>
>
>
> ---
> v/r
>
> Chris Apsey
> bitskrieg@xxxxxxxxxxxxx
> https://www.bitskrieg.net
>
> On 2017-04-26 10:53, Adam Carheden wrote:
>> What I'm trying to get from the list is /why/ the "enterprise" drives
>> are important. Performance? Reliability? Something else?
>>
>> The Intel was the only one I was seriously considering. The others were
>> just ones I had for other purposes, so I thought I'd see how they fared
>> in benchmarks.
>>
>> The Intel was the clear winner, but my tests did show that throughput
>> tanked with more threads. Hypothetically, if I was throwing 16 OSDs at
>> it, all with osd op threads = 2, do the benchmarks below not show that
>> the Hynix would be a better choice (at least for performance)?
>>
>> Also, 4 x Intel DC S3520 costs as much as 1 x Intel DC S3610. Obviously
>> the single drive leaves more bays free for OSD disks, but is there any
>> other reason a single S3610 is preferable to 4 S3520s? Wouldn't 4xS3520s
>> mean:
>>
>> a) fewer OSDs go down if the SSD fails
>>
>> b) better throughput (I'm speculating that the S3610 isn't 4 times
>> faster than the S3520)
>>
>> c) load spread across 4 SATA channels (I suppose this doesn't really
>> matter since the drives can't throttle the SATA bus).
>>
>>
>> --
>> Adam Carheden
>>
>> On 04/26/2017 01:55 AM, Eneko Lacunza wrote:
>>> Adam,
>>>
>>> What David said before about SSD drives is very important. I will tell
>>> you another way: use enterprise grade SSD drives, not consumer grade.
>>> Also, pay attention to endurance.
>>>
>>> The only suitable drive for Ceph I see in your tests is SSDSC2BB150G7,
>>> and probably it isn't even the most suitable SATA SSD disk from Intel;
>>> better use S3610 o S3710 series.
>>>
>>> Cheers
>>> Eneko
>>>
>>> El 25/04/17 a las 21:02, Adam Carheden escribió:
>>>> On 04/25/2017 11:57 AM, David wrote:
>>>>> On 19 Apr 2017 18:01, "Adam Carheden" <carheden@xxxxxxxx
>>>>> <mailto:carheden@xxxxxxxx>> wrote:
>>>>>
>>>>>      Does anyone know if XFS uses a single thread to write to it's
>>>>> journal?
>>>>>
>>>>>
>>>>> You probably know this but just to avoid any confusion, the journal in
>>>>> this context isn't the metadata journaling in XFS, it's a separate
>>>>> journal written to by the OSD daemons
>>>> Ha! I didn't know that.
>>>>
>>>>> I think the number of threads per OSD is controlled by the 'osd op
>>>>> threads' setting which defaults to 2
>>>> So the ideal (for performance) CEPH cluster would be one SSD per HDD
>>>> with 'osd op threads' set to whatever value fio shows as the optimal
>>>> number of threads for that drive then?
>>>>
>>>>> I would avoid the SanDisk and Hynix. The s3500 isn't too bad. Perhaps
>>>>> consider going up to a 37xx and putting more OSDs on it. Of course
>>>>> with
>>>>> the caveat that you'll lose more OSDs if it goes down.
>>>> Why would you avoid the SanDisk and Hynix? Reliability (I think those
>>>> two are both TLC)? Brand trust? If it's my benchmarks in my previous
>>>> email, why not the Hynix? It's slower than the Intel, but sort of
>>>> decent, at lease compared to the SanDisk.
>>>>
>>>> My final numbers are below, including an older Samsung Evo (MCL I
>>>> think)
>>>> which did horribly, though not as bad as the SanDisk. The Seagate is a
>>>> 10kRPM SAS "spinny" drive I tested as a control/SSD-to-HDD comparison.
>>>>
>>>>           SanDisk SDSSDA240G, fio  1 jobs:   7.0 MB/s (5 trials)
>>>>
>>>>
>>>>           SanDisk SDSSDA240G, fio  2 jobs:   7.6 MB/s (5 trials)
>>>>
>>>>
>>>>           SanDisk SDSSDA240G, fio  4 jobs:   7.5 MB/s (5 trials)
>>>>
>>>>
>>>>           SanDisk SDSSDA240G, fio  8 jobs:   7.6 MB/s (5 trials)
>>>>
>>>>
>>>>           SanDisk SDSSDA240G, fio 16 jobs:   7.6 MB/s (5 trials)
>>>>
>>>>
>>>>           SanDisk SDSSDA240G, fio 32 jobs:   7.6 MB/s (5 trials)
>>>>
>>>>
>>>>           SanDisk SDSSDA240G, fio 64 jobs:   7.6 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio  1 jobs:   4.2 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio  2 jobs:   0.6 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio  4 jobs:   7.5 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio  8 jobs:  17.6 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio 16 jobs:  32.4 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio 32 jobs:  64.4 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio 64 jobs:  71.6 MB/s (5 trials)
>>>>
>>>>
>>>>                  SAMSUNG SSD, fio  1 jobs:   2.2 MB/s (5 trials)
>>>>
>>>>
>>>>                  SAMSUNG SSD, fio  2 jobs:   3.9 MB/s (5 trials)
>>>>
>>>>
>>>>                  SAMSUNG SSD, fio  4 jobs:   7.1 MB/s (5 trials)
>>>>
>>>>
>>>>                  SAMSUNG SSD, fio  8 jobs:  12.0 MB/s (5 trials)
>>>>
>>>>
>>>>                  SAMSUNG SSD, fio 16 jobs:  18.3 MB/s (5 trials)
>>>>
>>>>
>>>>                  SAMSUNG SSD, fio 32 jobs:  25.4 MB/s (5 trials)
>>>>
>>>>
>>>>                  SAMSUNG SSD, fio 64 jobs:  26.5 MB/s (5 trials)
>>>>
>>>>
>>>>          INTEL SSDSC2BB150G7, fio  1 jobs:  91.2 MB/s (5 trials)
>>>>
>>>>
>>>>          INTEL SSDSC2BB150G7, fio  2 jobs: 132.4 MB/s (5 trials)
>>>>
>>>>
>>>>          INTEL SSDSC2BB150G7, fio  4 jobs: 138.2 MB/s (5 trials)
>>>>
>>>>
>>>>          INTEL SSDSC2BB150G7, fio  8 jobs: 116.9 MB/s (5 trials)
>>>>
>>>>
>>>>          INTEL SSDSC2BB150G7, fio 16 jobs:  61.8 MB/s (5 trials)
>>>>          INTEL SSDSC2BB150G7, fio 32 jobs:  22.7 MB/s (5 trials)
>>>>          INTEL SSDSC2BB150G7, fio 64 jobs:  16.9 MB/s (5 trials)
>>>>          SEAGATE ST9300603SS, fio  1 jobs:   0.7 MB/s (5 trials)
>>>>          SEAGATE ST9300603SS, fio  2 jobs:   0.9 MB/s (5 trials)
>>>>          SEAGATE ST9300603SS, fio  4 jobs:   1.6 MB/s (5 trials)
>>>>          SEAGATE ST9300603SS, fio  8 jobs:   2.0 MB/s (5 trials)
>>>>          SEAGATE ST9300603SS, fio 16 jobs:   4.6 MB/s (5 trials)
>>>>          SEAGATE ST9300603SS, fio 32 jobs:   6.9 MB/s (5 trials)
>>>>          SEAGATE ST9300603SS, fio 64 jobs:   0.6 MB/s (5 trials)
>>>>
>>>> For those who come across this and are looking for drives for purposes
>>>> other than CEPH, those are all sequential write numbers with caching
>>>> disabled, a very CEPH-journal-specific test. The SanDisk held it's own
>>>> against the Intel using some benchmarks on Windows that didn't disable
>>>> caching. It may very well be a perfectly good drive for other purposes.
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--

С уважением,
Дробышевский Владимир
Компания "АйТи Город"
+7 343 2222192

ИТ-консалтинг
Поставка проектов "под ключ"
Аутсорсинг ИТ-услуг
Аутсорсинг ИТ-инфраструктуры
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux