Re: Again - state of Ceph NVMe and SSDs

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 18 Jan 2016 11:01:44 -0800



On Sun, Jan 17, 2016 at 12:34 PM, Tyler Bishop
<tyler.bishop@xxxxxxxxxxxxxxxxx> wrote:
> The changes you are looking for are coming from Sandisk in the ceph "Jewel" release coming up.
>
> Based on benchmarks and testing, sandisk has really contributed heavily on the tuning aspects and are promising 90%+ native iop of a drive in the cluster.

Mmmm, they've gotten some very impressive numbers but most people
shouldn't be expecting 90% of an SSD's throughput out of their
workloads. These tests are *very* parallel and tend to run multiple
OSD processes on a single SSD, IIRC.
-Greg

>
> The biggest changes will come from the memory allocation with writes.  Latency is going to be a lot lower.
>
>
> ----- Original Message -----
> From: "David" <david@xxxxxxxxxx>
> To: "Wido den Hollander" <wido@xxxxxxxx>
> Cc: ceph-users@xxxxxxxxxxxxxx
> Sent: Sunday, January 17, 2016 6:49:25 AM
> Subject: Re:  Again - state of Ceph NVMe and SSDs
>
> Thanks Wido, those are good pointers indeed :)
> So we just have to make sure the backend storage (SSD/NVMe journals) won’t be saturated (or the controllers) and then go with as many RBD per VM as possible.
>
> Kind Regards,
> David Majchrzak
>
> 16 jan 2016 kl. 22:26 skrev Wido den Hollander <wido@xxxxxxxx>:
>
>> On 01/16/2016 07:06 PM, David wrote:
>>> Hi!
>>>
>>> We’re planning our third ceph cluster and been trying to find how to
>>> maximize IOPS on this one.
>>>
>>> Our needs:
>>> * Pool for MySQL, rbd (mounted as /var/lib/mysql or equivalent on KVM
>>> servers)
>>> * Pool for storage of many small files, rbd (probably dovecot maildir
>>> and dovecot index etc)
>>>
>>
>> Not completely NVMe related, but in this case, make sure you use
>> multiple disks.
>>
>> For MySQL for example:
>>
>> - Root disk for OS
>> - Disk for /var/lib/mysql (data)
>> - Disk for /var/log/mysql (binary log)
>> - Maybe even a InnoDB logfile disk
>>
>> With RBD you gain more performance by sending I/O into the cluster in
>> parallel. So when ever you can, do so!
>>
>> Regarding small files, it might be interesting to play with the stripe
>> count and stripe size there. By default this is 1 and 4MB. But maybe 16
>> and 256k work better here.
>>
>> With Dovecot as well, use a different RBD disk for the indexes and a
>> different one for the Maildir itself.
>>
>> Ceph excels at parallel performance. That is what you want to aim for.
>>
>>> So I’ve been reading up on:
>>>
>>> https://communities.intel.com/community/itpeernetwork/blog/2015/11/20/the-future-ssd-is-here-pcienvme-boosts-ceph-performance
>>>
>>> and ceph-users from october 2015:
>>>
>>> http://www.spinics.net/lists/ceph-users/msg22494.html
>>>
>>> We’re planning something like 5 OSD servers, with:
>>>
>>> * 4x 1.2TB Intel S3510
>>> * 8st 4TB HDD
>>> * 2x Intel P3700 Series HHHL PCIe 400GB (one for SSD Pool Journal and
>>> one for HDD pool journal)
>>> * 2x 80GB Intel S3510 raid1 for system
>>> * 256GB RAM
>>> * 2x 8 core CPU Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz or better
>>>
>>> This cluster will probably run Hammer LTS unless there are huge
>>> improvements in Infernalis when dealing 4k IOPS.
>>>
>>> The first link above hints at awesome performance. The second one from
>>> the list not so much yet..
>>>
>>> Is anyone running Hammer or Infernalis with a setup like this?
>>> Is it a sane setup?
>>> Will we become CPU constrained or can we just throw more RAM on it? :D
>>>
>>> Kind Regards,
>>> David Majchrzak
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>> --
>> Wido den Hollander
>> 42on B.V.
>> Ceph trainer and consultant
>>
>> Phone: +31 (0)20 700 9902
>> Skype: contact42on
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com