Re: Again - state of Ceph NVMe and SSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Not at all! For all we know, the drives may be the fastest ones on the planet. My comment still stands though. Be skeptical of any one benchmark that shows something unexpected. 90% of native SSD/NVMe IOPS performance in a distributed storage system is just such a number. Look for test repeatability. Look for independent verification. The memory allocator tests that Sandisk initially performed and that were later independently verified by Intel, Red Hat, and others is a great example of this kind of process.

Mark

On 01/19/2016 07:01 AM, Tyler Bishop wrote:
It sounds like your just assuming these drives don't perform good...

----- Original Message -----
From: "Mark Nelson" <mnelson@xxxxxxxxxx>
To: ceph-users@xxxxxxxxxxxxxx
Sent: Monday, January 18, 2016 2:17:19 PM
Subject: Re:  Again - state of Ceph NVMe and SSDs

Take Greg's comments to heart, because he's absolutely correct here.
Distributed storage systems almost as a rule love parallelism and if you
have enough you can often hide other issues.  Latency is probably the
more interesting question, and frankly that's where you'll often start
seeing the kernel, ceph code, drivers, random acts of god, etc, get in
the way.  It's very easy for any one of these things to destroy your
performance, so you have to be *very* *very* careful to understand
exactly what you are seeing.  As such, don't trust any one benchmark.
Wait until it's independently verified, possibly by multiple sources,
before putting too much weight into it.

Mark

On 01/18/2016 01:02 PM, Tyler Bishop wrote:
One of the other guys on the list here benchmarked them.  They spanked every other ssd on the *recommended* tree..

----- Original Message -----
From: "Gregory Farnum" <gfarnum@xxxxxxxxxx>
To: "Tyler Bishop" <tyler.bishop@xxxxxxxxxxxxxxxxx>
Cc: "David" <david@xxxxxxxxxx>, "Ceph Users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Monday, January 18, 2016 2:01:44 PM
Subject: Re:  Again - state of Ceph NVMe and SSDs

On Sun, Jan 17, 2016 at 12:34 PM, Tyler Bishop
<tyler.bishop@xxxxxxxxxxxxxxxxx> wrote:
The changes you are looking for are coming from Sandisk in the ceph "Jewel" release coming up.

Based on benchmarks and testing, sandisk has really contributed heavily on the tuning aspects and are promising 90%+ native iop of a drive in the cluster.

Mmmm, they've gotten some very impressive numbers but most people
shouldn't be expecting 90% of an SSD's throughput out of their
workloads. These tests are *very* parallel and tend to run multiple
OSD processes on a single SSD, IIRC.
-Greg


The biggest changes will come from the memory allocation with writes.  Latency is going to be a lot lower.


----- Original Message -----
From: "David" <david@xxxxxxxxxx>
To: "Wido den Hollander" <wido@xxxxxxxx>
Cc: ceph-users@xxxxxxxxxxxxxx
Sent: Sunday, January 17, 2016 6:49:25 AM
Subject: Re:  Again - state of Ceph NVMe and SSDs

Thanks Wido, those are good pointers indeed :)
So we just have to make sure the backend storage (SSD/NVMe journals) won’t be saturated (or the controllers) and then go with as many RBD per VM as possible.

Kind Regards,
David Majchrzak

16 jan 2016 kl. 22:26 skrev Wido den Hollander <wido@xxxxxxxx>:

On 01/16/2016 07:06 PM, David wrote:
Hi!

We’re planning our third ceph cluster and been trying to find how to
maximize IOPS on this one.

Our needs:
* Pool for MySQL, rbd (mounted as /var/lib/mysql or equivalent on KVM
servers)
* Pool for storage of many small files, rbd (probably dovecot maildir
and dovecot index etc)


Not completely NVMe related, but in this case, make sure you use
multiple disks.

For MySQL for example:

- Root disk for OS
- Disk for /var/lib/mysql (data)
- Disk for /var/log/mysql (binary log)
- Maybe even a InnoDB logfile disk

With RBD you gain more performance by sending I/O into the cluster in
parallel. So when ever you can, do so!

Regarding small files, it might be interesting to play with the stripe
count and stripe size there. By default this is 1 and 4MB. But maybe 16
and 256k work better here.

With Dovecot as well, use a different RBD disk for the indexes and a
different one for the Maildir itself.

Ceph excels at parallel performance. That is what you want to aim for.

So I’ve been reading up on:

https://communities.intel.com/community/itpeernetwork/blog/2015/11/20/the-future-ssd-is-here-pcienvme-boosts-ceph-performance

and ceph-users from october 2015:

http://www.spinics.net/lists/ceph-users/msg22494.html

We’re planning something like 5 OSD servers, with:

* 4x 1.2TB Intel S3510
* 8st 4TB HDD
* 2x Intel P3700 Series HHHL PCIe 400GB (one for SSD Pool Journal and
one for HDD pool journal)
* 2x 80GB Intel S3510 raid1 for system
* 256GB RAM
* 2x 8 core CPU Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz or better

This cluster will probably run Hammer LTS unless there are huge
improvements in Infernalis when dealing 4k IOPS.

The first link above hints at awesome performance. The second one from
the list not so much yet..

Is anyone running Hammer or Infernalis with a setup like this?
Is it a sane setup?
Will we become CPU constrained or can we just throw more RAM on it? :D

Kind Regards,
David Majchrzak


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux