Re: SSDs for data drives

Satish Patel <satish.txt@xxxxxxxxx> · Mon, 16 Jul 2018 14:24:49 -0400



I just ran test on Samsung 850 Pro 500GB (how to interpret result of
following output?)


[root@compute-01 tmp]# fio --filename=/dev/sda --direct=1 --sync=1
--rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based
--group_reporting --name=journal-test
journal-test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B,
(T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=76.0MiB/s][r=0,w=19.7k
IOPS][eta 00m:00s]
journal-test: (groupid=0, jobs=1): err= 0: pid=6969: Mon Jul 16 14:21:27 2018
  write: IOPS=20.1k, BW=78.6MiB/s (82.5MB/s)(4719MiB/60001msec)
    clat (usec): min=36, max=4525, avg=47.22, stdev=16.65
     lat (usec): min=36, max=4526, avg=47.57, stdev=16.69
    clat percentiles (usec):
     |  1.00th=[   39],  5.00th=[   40], 10.00th=[   40], 20.00th=[   41],
     | 30.00th=[   43], 40.00th=[   48], 50.00th=[   49], 60.00th=[   50],
     | 70.00th=[   50], 80.00th=[   51], 90.00th=[   52], 95.00th=[   53],
     | 99.00th=[   62], 99.50th=[   65], 99.90th=[  108], 99.95th=[  363],
     | 99.99th=[  396]
   bw (  KiB/s): min=72152, max=96464, per=100.00%, avg=80581.45,
stdev=7032.18, samples=119
   iops        : min=18038, max=24116, avg=20145.34, stdev=1758.05, samples=119
  lat (usec)   : 50=71.83%, 100=28.06%, 250=0.03%, 500=0.08%, 750=0.01%
  lat (usec)   : 1000=0.01%
  lat (msec)   : 2=0.01%, 10=0.01%
  cpu          : usr=9.44%, sys=31.95%, ctx=1209952, majf=0, minf=78
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,1207979,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=78.6MiB/s (82.5MB/s), 78.6MiB/s-78.6MiB/s
(82.5MB/s-82.5MB/s), io=4719MiB (4948MB), run=60001-60001msec

Disk stats (read/write):
  sda: ios=0/1205921, merge=0/29, ticks=0/41418, in_queue=40965, util=68.35%

On Mon, Jul 16, 2018 at 1:18 PM, Michael Kuriger <mk7193@xxxxxxxxx> wrote:
> I dunno, to me benchmark tests are only really useful to compare different
> drives.
>
>
>
>
>
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Paul Emmerich
> Sent: Monday, July 16, 2018 8:41 AM
> To: Satish Patel
> Cc: ceph-users
>
>
> Subject: Re:  SSDs for data drives
>
>
>
> This doesn't look like a good benchmark:
>
> (from the blog post)
>
> dd if=/dev/zero of=/mnt/rawdisk/data.bin bs=1G count=20 oflag=direct
>
> 1. it writes compressible data which some SSDs might compress, you should
> use urandom
>
> 2. that workload does not look like something Ceph will do to your disk,
> like not at all
>
> If you want a quick estimate of an SSD in worst-case scenario: run the usual
> 4k oflag=direct,dsync test (or better: fio).
>
> A bad SSD will get < 1k IOPS, a good one > 10k
>
> But that doesn't test everything. In particular, performance might degrade
> as the disks fill up. Also, it's the absolute
>
> worst-case, i.e., a disk used for multiple journal/wal devices
>
>
>
>
>
> Paul
>
>
>
> 2018-07-16 10:09 GMT-04:00 Satish Patel <satish.txt@xxxxxxxxx>:
>
> https://blog.cypressxt.net/hello-ceph-and-samsung-850-evo/
>
>
> On Thu, Jul 12, 2018 at 3:37 AM, Adrian Saul
> <Adrian.Saul@xxxxxxxxxxxxxxxxx> wrote:
>>
>>
>> We started our cluster with consumer (Samsung EVO) disks and the write
>> performance was pitiful, they had periodic spikes in latency (average of
>> 8ms, but much higher spikes) and just did not perform anywhere near where
>> we
>> were expecting.
>>
>>
>>
>> When replaced with SM863 based devices the difference was night and day.
>> The DC grade disks held a nearly constant low latency (contantly sub-ms),
>> no
>> spiking and performance was massively better.   For a period I ran both
>> disks in the cluster and was able to graph them side by side with the same
>> workload.  This was not even a moderately loaded cluster so I am glad we
>> discovered this before we went full scale.
>>
>>
>>
>> So while you certainly can do cheap and cheerful and let the data
>> availability be handled by Ceph, don’t expect the performance to keep up.
>>
>>
>>
>>
>>
>>
>>
>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
>> Satish Patel
>> Sent: Wednesday, 11 July 2018 10:50 PM
>> To: Paul Emmerich <paul.emmerich@xxxxxxxx>
>> Cc: ceph-users <ceph-users@xxxxxxxxxxxxxx>
>> Subject: Re:  SSDs for data drives
>>
>>
>>
>> Prices going way up if I am picking Samsung SM863a for all data drives.
>>
>>
>>
>> We have many servers running on consumer grade sad drives and we never
>> noticed any performance or any fault so far (but we never used ceph
>> before)
>>
>>
>>
>> I thought that is the whole point of ceph to provide high availability if
>> drive go down also parellel read from multiple osd node
>>
>>
>>
>> Sent from my iPhone
>>
>>
>> On Jul 11, 2018, at 6:57 AM, Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:
>>
>> Hi,
>>
>>
>>
>> we‘ve no long-term data for the SM variant.
>>
>> Performance is fine as far as we can tell, but the main difference between
>> these two models should be endurance.
>>
>>
>>
>>
>>
>> Also, I forgot to mention that my experiences are only for the 1, 2, and 4
>> TB variants. Smaller SSDs are often proportionally slower (especially
>> below
>> 500GB).
>>
>>
>>
>> Paul
>>
>>
>> Robert Stanford <rstanford8896@xxxxxxxxx>:
>>
>> Paul -
>>
>>
>>
>>  That's extremely helpful, thanks.  I do have another cluster that uses
>> Samsung SM863a just for journal (spinning disks for data).  Do you happen
>> to
>> have an opinion on those as well?
>>
>>
>>
>> On Wed, Jul 11, 2018 at 4:03 AM, Paul Emmerich <paul.emmerich@xxxxxxxx>
>> wrote:
>>
>> PM/SM863a are usually great disks and should be the default go-to option,
>> they outperform
>>
>> even the more expensive PM1633 in our experience.
>>
>> (But that really doesn't matter if it's for the full OSD and not as
>> dedicated WAL/journal)
>>
>>
>>
>> We got a cluster with a few hundred SanDisk Ultra II (discontinued, i
>> believe) that was built on a budget.
>>
>> Not the best disk but great value. They have been running since ~3 years
>> now
>> with very few failures and
>>
>> okayish overall performance.
>>
>>
>>
>> We also got a few clusters with a few hundred SanDisk Extreme Pro, but we
>> are not yet sure about their
>>
>> long-time durability as they are only ~9 months old (average of ~1000
>> write
>> IOPS on each disk over that time).
>>
>> Some of them report only 50-60% lifetime left.
>>
>>
>>
>> For NVMe, the Intel NVMe 750 is still a great disk
>>
>>
>>
>> Be carefuly to get these exact models. Seemingly similar disks might be
>> just
>> completely bad, for
>>
>> example, the Samsung PM961 is just unusable for Ceph in our experience.
>>
>>
>>
>> Paul
>>
>>
>>
>> 2018-07-11 10:14 GMT+02:00 Wido den Hollander <wido@xxxxxxxx>:
>>
>>
>>
>> On 07/11/2018 10:10 AM, Robert Stanford wrote:
>>>
>>>  In a recent thread the Samsung SM863a was recommended as a journal
>>> SSD.  Are there any recommendations for data SSDs, for people who want
>>> to use just SSDs in a new Ceph cluster?
>>>
>>
>> Depends on what you are looking for, SATA, SAS3 or NVMe?
>>
>> I have very good experiences with these drives running with BlueStore in
>> them in SuperMicro machines:
>>
>> - SATA: Samsung PM863a
>> - SATA: Intel S4500
>> - SAS: Samsung PM1633
>> - NVMe: Samsung PM963
>>
>> Running WAL+DB+DATA with BlueStore on the same drives.
>>
>> Wido
>>
>>>  Thank you
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> --
>>
>> Paul Emmerich
>>
>> Looking for help with your Ceph cluster? Contact us at https://croit.io
>>
>> croit GmbH
>> Freseniusstr. 31h
>> 81247 München
>> www.croit.io
>> Tel: +49 89 1896585 90
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> Confidentiality: This email and any attachments are confidential and may
>> be
>> subject to copyright, legal or some other professional privilege. They are
>> intended solely for the attention and use of the named addressee(s). They
>> may only be copied, distributed or disclosed with the consent of the
>> copyright owner. If you have received this email by mistake or by breach
>> of
>> the confidentiality clause, please notify the sender immediately by return
>> email and delete or destroy all copies of the email. Any confidentiality,
>> privilege or copyright is not waived or lost because this email has been
>> sent to you by mistake.
>
>
>
>
> --
>
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com