Hi,
Has anyone used the new S3520's? They are 1 DWPD and so much
closer to the S3610 than previous S35x0's.
Cheers
El 01/05/17 a las 17:41, David Turner escribió:
I can attest to this. I had a cluster that used
3510's for the first rack and then switched to 3710's after
that. We had 3TB drives and every single 3510 ran out of writes
after 1.5 years. We noticed because we tracked down incredibly
slow performance to a subset of OSDs and each time they had a
common journal. This happened for about 2 weeks and 4 journals,
That was when we realized that they were all 3510 journals and
SMART showed not only the journals we had tracked down, but all
of the 3510's were out of writes. Replacing all of your
journals every 1.5 years is way more expensive than the
increased cost of the 3710's. That was our use case and
experience, but I'm pretty sure that any cluster large enough to
fill at least most of a rack will run into this much sooner than
later.
Hi,
Lots of good info on SSD endurance in this thread.
For Ceph journal you should also consider the size of
the backing OSDs: the SSD journal won't last as long if
backing 5x8TB OSDs or 5x1TB OSDs.
For example, the S3510 480GB (275TB of endurance), if
backing 5x8TB (40TB) OSDs, will provide very little
endurance, assuming triple replication you will be able to
fill the OSDs twice and that's about it (275/(5x8)/3).
On the other end of the scale a 1.2TB S3710 backing
5x1TB will be able to fill them 1620 times before running
out of endurance (24300/(5x1)/3).
Ultimately it depends on your workload. Some people can
get away with S3510 as journals if the workload is read
intensive, but in most cases the higher endurance is a
safe bet (S3710 or S3610).
Cheers,
Maxime
Sorry
for topposting, but..
The Intel 35xx drives are rated for a much lower
DWPD (drive-writes-per-day) than the 36xx or 37xx
models.
Keep in mind that a single SSD that acts as
journal for 5 OSDs will recieve ALL writes for
those 5 OSDs before the data is moved off to the
OSDs actual data drives.
This makes for quite a lot of writes, and along
with the consumer/enterprise advice others have
written about, your SSD journal devices will
recieve quite a lot of writes over time.
The S3510 is rated for 0.3 DWPD for 5 years (http://www.intel.com/content/www/us/en/solid-state-drives/ssd-dc-s3510-spec.html)
The S3610 is rated for 3 DWPD for 5 years (http://www.intel.com/content/www/us/en/solid-state-drives/ssd-dc-s3610-spec.html)
The S3710 is rated for 10 DWPD for 5 years (http://www.intel.com/content/www/us/en/solid-state-drives/ssd-dc-s3710-spec.html)
A 480GB S3510 has no endurance left once you have
written 0.275PB to it.
A 480GB S3610 has no endurance left once you have
written 3.7PB to it.
A 400GB S3710 has no endurance left once you have
written 8.3PB to it.
This makes for quite a lot of difference over time
- even if a S3510 wil only act as journal for 1 or
2 OSDs, it will wear out much much much faster
than others.
And I know I've used the xx10 models above, but
the xx00 models have all been replaced by those
newer models now.
And yes, the xx10 models are using MLC NAND, but
so were the xx00 models, that have a proven
trackrecord and delivers what Intel promised in
the datasheet.
You could try and take a look at some of the
enterprise SSDs that Samsung has launched.
Price-wise they are very competitive to Intel, but
I want to see (or at least hear from others) if
they can deliver what their datasheet promises.
Samsungs consumer SSDs did not (840/850 Pro), so
I'm only using S3710s in my cluster.
Before I created our own cluster some time ago, I
found these threads from the mailinglist regarding
the exact same disks we had been expecting to use
(Samsung 840/850 Pro), that was quickly changed to
Intel S3710s:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044258.html
https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg17369.html
A longish thread about Samsung consumer drives:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-April/000572.html
- highlights from that thread:
- http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-April/000610.html
- http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-April/000611.html
- http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-April/000798.html
Regards,
Jens Dueholm Christensen
Rambøll Survey IT
-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On Behalf Of Adam Carheden
Sent: Wednesday, April 26, 2017 5:54 PM
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re: Sharing SSD journals and
SSD drive choice
Thanks everyone for the replies.
I will be avoiding TLC drives, it was just
something easy to benchmark
with existing equipment. I hadn't though of
unscrupulous data durability
lies or performance suddenly tanking in
unpredictable ways. I guess it
all comes down to trusting the vendor since it
would be expensive in
time and $$ to test for such things.
Any thoughts on multiple Intel 35XX vs a single
36XX/37XX? All have "DC"
prefixes and are listed in the Data Center section
of their marketing
pages, so I assume they'll all have the same
quality underlying NAND.
--
Adam Carheden
On 04/26/2017 09:20 AM, Chris Apsey wrote:
> Adam,
>
> Before we deployed our cluster, we did
extensive testing on all kinds of
> SSDs, from consumer-grade TLC SATA all the
way to Enterprise PCI-E NVME
> Drives. We ended up going with a ratio of 1x
Intel P3608 PCI-E 1.6 TB
> to 12x HGST 10TB SAS3 HDDs. It provided the
best
> price/performance/density balance for us
overall. As a frame of
> reference, we have 384 OSDs spread across 16
nodes.
>
> A few (anecdotal) notes:
>
> 1. Consumer SSDs have unpredictable
performance under load; write
> latency can go from normal to unusable with
almost no warning.
> Enterprise drives generally show much less
load sensitivity.
> 2. Write endurance; while it may appear that
having several
> consumer-grade SSDs backing a smaller number
of OSDs will yield better
> longevity than an enterprise grade SSD
backing a larger number of OSDs,
> the reality is that enterprise drives that
use SLC or eMLC are generally
> an order of magnitude more reliable when all
is said and done.
> 3. Power Loss protection (PLP). Consumer
drives generally don't do well
> when power is suddenly lost. Yes, we should
all have UPS, etc., but
> things happen. Enterprise drives are much
more tolerant of
> environmental failures. Recovering from
misplaced objects while also
> attempting to serve clients is no fun.
>
>
>
>
>
> ---
> v/r
>
> Chris Apsey
> bitskrieg@xxxxxxxxxxxxx
> https://www.bitskrieg.net
>
> On 2017-04-26 10:53, Adam Carheden wrote:
>> What I'm trying to get from the list is
/why/ the "enterprise" drives
>> are important. Performance? Reliability?
Something else?
>>
>> The Intel was the only one I was
seriously considering. The others were
>> just ones I had for other purposes, so I
thought I'd see how they fared
>> in benchmarks.
>>
>> The Intel was the clear winner, but my
tests did show that throughput
>> tanked with more threads. Hypothetically,
if I was throwing 16 OSDs at
>> it, all with osd op threads = 2, do the
benchmarks below not show that
>> the Hynix would be a better choice (at
least for performance)?
>>
>> Also, 4 x Intel DC S3520 costs as much as
1 x Intel DC S3610. Obviously
>> the single drive leaves more bays free
for OSD disks, but is there any
>> other reason a single S3610 is preferable
to 4 S3520s? Wouldn't 4xS3520s
>> mean:
>>
>> a) fewer OSDs go down if the SSD fails
>>
>> b) better throughput (I'm speculating
that the S3610 isn't 4 times
>> faster than the S3520)
>>
>> c) load spread across 4 SATA channels (I
suppose this doesn't really
>> matter since the drives can't throttle
the SATA bus).
>>
>>
>> --
>> Adam Carheden
>>
>> On 04/26/2017 01:55 AM, Eneko Lacunza
wrote:
>>> Adam,
>>>
>>> What David said before about SSD
drives is very important. I will tell
>>> you another way: use enterprise grade
SSD drives, not consumer grade.
>>> Also, pay attention to endurance.
>>>
>>> The only suitable drive for Ceph I
see in your tests is SSDSC2BB150G7,
>>> and probably it isn't even the most
suitable SATA SSD disk from Intel;
>>> better use S3610 o S3710 series.
>>>
>>> Cheers
>>> Eneko
>>>
>>> El 25/04/17 a las 21:02, Adam
Carheden escribió:
>>>> On 04/25/2017 11:57 AM, David
wrote:
>>>>> On 19 Apr 2017 18:01, "Adam
Carheden" <carheden@xxxxxxxx
>>>>> <mailto:carheden@xxxxxxxx>>
wrote:
>>>>>
>>>>> Does anyone know if XFS
uses a single thread to write to it's
>>>>> journal?
>>>>>
>>>>>
>>>>> You probably know this but
just to avoid any confusion, the journal in
>>>>> this context isn't the
metadata journaling in XFS, it's a separate
>>>>> journal written to by the OSD
daemons
>>>> Ha! I didn't know that.
>>>>
>>>>> I think the number of threads
per OSD is controlled by the 'osd op
>>>>> threads' setting which
defaults to 2
>>>> So the ideal (for performance)
CEPH cluster would be one SSD per HDD
>>>> with 'osd op threads' set to
whatever value fio shows as the optimal
>>>> number of threads for that drive
then?
>>>>
>>>>> I would avoid the SanDisk and
Hynix. The s3500 isn't too bad. Perhaps
>>>>> consider going up to a 37xx
and putting more OSDs on it. Of course
>>>>> with
>>>>> the caveat that you'll lose
more OSDs if it goes down.
>>>> Why would you avoid the SanDisk
and Hynix? Reliability (I think those
>>>> two are both TLC)? Brand trust?
If it's my benchmarks in my previous
>>>> email, why not the Hynix? It's
slower than the Intel, but sort of
>>>> decent, at lease compared to the
SanDisk.
>>>>
>>>> My final numbers are below,
including an older Samsung Evo (MCL I
>>>> think)
>>>> which did horribly, though not as
bad as the SanDisk. The Seagate is a
>>>> 10kRPM SAS "spinny" drive I
tested as a control/SSD-to-HDD comparison.
>>>>
>>>> SanDisk SDSSDA240G,
fio 1 jobs: 7.0 MB/s (5 trials)
>>>>
>>>>
>>>> SanDisk SDSSDA240G,
fio 2 jobs: 7.6 MB/s (5 trials)
>>>>
>>>>
>>>> SanDisk SDSSDA240G,
fio 4 jobs: 7.5 MB/s (5 trials)
>>>>
>>>>
>>>> SanDisk SDSSDA240G,
fio 8 jobs: 7.6 MB/s (5 trials)
>>>>
>>>>
>>>> SanDisk SDSSDA240G, fio
16 jobs: 7.6 MB/s (5 trials)
>>>>
>>>>
>>>> SanDisk SDSSDA240G, fio
32 jobs: 7.6 MB/s (5 trials)
>>>>
>>>>
>>>> SanDisk SDSSDA240G, fio
64 jobs: 7.6 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio
1 jobs: 4.2 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio
2 jobs: 0.6 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio
4 jobs: 7.5 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio
8 jobs: 17.6 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio
16 jobs: 32.4 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio
32 jobs: 64.4 MB/s (5 trials)
>>>>
>>>>
>>>> HFS250G32TND-N1A2A 30000P10, fio
64 jobs: 71.6 MB/s (5 trials)
>>>>
>>>>
>>>> SAMSUNG SSD,
fio 1 jobs: 2.2 MB/s (5 trials)
>>>>
>>>>
>>>> SAMSUNG SSD,
fio 2 jobs: 3.9 MB/s (5 trials)
>>>>
>>>>
>>>> SAMSUNG SSD,
fio 4 jobs: 7.1 MB/s (5 trials)
>>>>
>>>>
>>>> SAMSUNG SSD,
fio 8 jobs: 12.0 MB/s (5 trials)
>>>>
>>>>
>>>> SAMSUNG SSD, fio
16 jobs: 18.3 MB/s (5 trials)
>>>>
>>>>
>>>> SAMSUNG SSD, fio
32 jobs: 25.4 MB/s (5 trials)
>>>>
>>>>
>>>> SAMSUNG SSD, fio
64 jobs: 26.5 MB/s (5 trials)
>>>>
>>>>
>>>> INTEL SSDSC2BB150G7,
fio 1 jobs: 91.2 MB/s (5 trials)
>>>>
>>>>
>>>> INTEL SSDSC2BB150G7,
fio 2 jobs: 132.4 MB/s (5 trials)
>>>>
>>>>
>>>> INTEL SSDSC2BB150G7,
fio 4 jobs: 138.2 MB/s (5 trials)
>>>>
>>>>
>>>> INTEL SSDSC2BB150G7,
fio 8 jobs: 116.9 MB/s (5 trials)
>>>>
>>>>
>>>> INTEL SSDSC2BB150G7, fio
16 jobs: 61.8 MB/s (5 trials)
>>>> INTEL SSDSC2BB150G7, fio
32 jobs: 22.7 MB/s (5 trials)
>>>> INTEL SSDSC2BB150G7, fio
64 jobs: 16.9 MB/s (5 trials)
>>>> SEAGATE ST9300603SS,
fio 1 jobs: 0.7 MB/s (5 trials)
>>>> SEAGATE ST9300603SS,
fio 2 jobs: 0.9 MB/s (5 trials)
>>>> SEAGATE ST9300603SS,
fio 4 jobs: 1.6 MB/s (5 trials)
>>>> SEAGATE ST9300603SS,
fio 8 jobs: 2.0 MB/s (5 trials)
>>>> SEAGATE ST9300603SS, fio
16 jobs: 4.6 MB/s (5 trials)
>>>> SEAGATE ST9300603SS, fio
32 jobs: 6.9 MB/s (5 trials)
>>>> SEAGATE ST9300603SS, fio
64 jobs: 0.6 MB/s (5 trials)
>>>>
>>>> For those who come across this
and are looking for drives for purposes
>>>> other than CEPH, those are all
sequential write numbers with caching
>>>> disabled, a very
CEPH-journal-specific test. The SanDisk held it's
own
>>>> against the Intel using some
benchmarks on Windows that didn't disable
>>>> caching. It may very well be a
perfectly good drive for other purposes.
>>>>
>>>>
_______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>>
>>
_______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943493611
943324914
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es
|