Re: RFC Bluestore-Cluster of SAMSUNG PM863a

Kevin Olbrich <ko@xxxxxxx> · Fri, 2 Feb 2018 16:42:37 +0100

2018-02-02 12:44 GMT+01:00 Richard Hesketh <richard.hesketh@xxxxxxxxxxxx>:
On 02/02/18 08:33, Kevin Olbrich wrote:

> Hi!

>

> I am planning a new Flash-based cluster. In the past we used SAMSUNG PM863a 480G as journal drives in our HDD cluster.

> After a lot of tests with luminous and bluestore on HDD clusters, we plan to re-deploy our whole RBD pool (OpenNebula cloud) using these disks.

>

> As far as I understand, it would be best to skip journaling / WAL and just deploy every OSD 1-by-1. This would have the following pro's (correct me, if I am wrong):

> - maximum performance as the journal is spread accross all devices

> - a lost drive does not affect any other drive

>

> Currently we are on CentOS 7 with elrepo 4.4.x-kernel. We plan to migrate to Ubuntu 16.04.3 with HWE (kernel 4.10).

> Clients will be Fedora 27 + OpenNebula.

>

> Any comments?

>

> Thank you.

>

> Kind regards,

> Kevin

There is only a real advantage to separating the DB/WAL from the main data if they're going to be hosted on a device which is appreciably faster than the main storage. Since you're going all SSD, it makes sense to deploy each OSD all-in-one; as you say, you don't bottleneck on any one disk, and it also offers you more maintenance flexibility as you will be able to easily move OSDs between hosts if required. If you wanted to start pushing performance more, you'd be looking at putting NVMe disks in your hosts for DB/WAL.

We got some Intel P3700 NVMe (PCIe) disks but each host will be serving 10 OSDs, combined sync-speed on the samsungs was better than this single NVMe (we did some short fio-benchmarks no real-ceph-test, could also be different now).
If performance is only slightly better, sticking to single OSD failure domain is better for maintenance, as this new cluster will not be monitored 24/7 by our staff while migration is in progress.

FYI, the 16.04 HWE kernel has currently rolled on over to 4.13.

Did someone test this kernel branch with ceph? Any performance impact? If I unterstood the docs, Ubuntu is a well tested plattform for ceph, so this should have been already tested (?).

May I ask why are you using EL repo with centos?
AFAIK, Redhat is backporting all ceph features to 3.10 kernels. Am I wrong?

Before we moved from OpenStack to OpenNebula in early 2017, we had some problems with krbd / fuse (missing features, etc.).
We then decided to move from 3.10 zu 4.4 which solved all problems and we noticed a small performance improvement.
Maybe these problems are solved already, we had these problems when we rolled out Mitaka. 
We did not change our deployment scripts since then, thats why we are still at kernel-ml.

Kind regards,
Kevin
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com