Re: VMware + Ceph using NFS sync/async ?

Adrian Saul <Adrian.Saul@xxxxxxxxxxxxxxxxx> · Thu, 17 Aug 2017 00:13:24 +0000

We are using Ceph on NFS for VMWare – we are using SSD tiers in front of SATA and some direct SSD pools.  The datastores are just XFS file systems on RBD managed by a pacemaker cluster
 for failover.

Lessons so far are that large datastores quickly run out of IOPS and compete for performance – you are better off with many smaller RBDs (say 1TB) to spread out workloads.  Also tuning
 up NFS threads seems to help.

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On Behalf Of Osama Hasebou

Sent: Wednesday, 16 August 2017 10:34 PM

To: nick@xxxxxxxxxx

Cc: ceph-users <ceph-users@xxxxxxxxxxxxxx>

Subject: Re: [ceph-users] VMware + Ceph using NFS sync/async ?

Hi Nick,

Thanks for replying! If Ceph is combined with Openstack then, does that mean that actually when openstack writes are happening, it is not fully sync'd (as in written
 to disks) before it starts receiving more data, so acting as async ? In that scenario there is a chance for data loss if things go bad, i.e power outage or something like that ?

As for the slow operations, reading is quite fine when I compare it to a SAN storage system connected to VMware. It is writing data, small chunks or big ones, that
 suffer when trying to use the sync option with FIO for benchmarking.

In that case, I wonder, is no one using CEPH with VMware in a production environment ?

Cheers.

Regards,

Ossi

Hi Osama,

This is a known problem with many software defined storage stacks, but potentially slightly worse with Ceph due to extra overheads. Sync writes have to wait until all copies of the data are written to disk by the
 OSD and acknowledged back to the client. The extra network hops for replication and NFS gateways add significant latency which impacts the time it takes to carry out small writes. The Ceph code also takes time to process each IO request.

What particular operations are you finding slow? Storage vmotions are just bad, and I don’t think there is much that can be done about them as they are split into lots of 64kb IO’s.

One thing you can try is to force the CPU’s on your OSD nodes to run at C1 cstate and force their minimum frequency to 100%. This can have quite a large impact on latency. Also you don’t specify your network, but
 10G is a must.

Nick

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On Behalf Of Osama Hasebou

Sent: 14 August 2017 12:27

To: ceph-users <ceph-users@xxxxxxxxxxxxxx>

Subject: [ceph-users] VMware + Ceph using NFS sync/async ?

Hi Everyone,

We started testing the idea of using Ceph storage with VMware, the idea was to provide Ceph storage through open stack to VMware, by creating a virtual machine coming
 from Ceph + Openstack , which acts as an NFS gateway, then mount that storage on top of VMware cluster.

When mounting the NFS exports using the sync option, we noticed a huge degradation in performance which makes it very slow to use it in production, the async option
 makes it much better but then there is the risk of it being risky that in case a failure shall happen, some data might be lost in that Scenario.

Now I understand that some people in the ceph community are using Ceph with VMware using NFS gateways, so if you can kindly shed some light on your experience, and
 if you do use it in production purpose, that would be great and how did you mitigate the sync/async options and keep write performance.

Thanks you!!!

Regards,

Ossi

Confidentiality: This email and any attachments are confidential and may be subject to copyright, legal or some other professional privilege. They are intended solely for the attention and use of the named addressee(s). They may only be copied, distributed
 or disclosed with the consent of the copyright owner. If you have received this email by mistake or by breach of the confidentiality clause, please notify the sender immediately by return email and delete or destroy all copies of the email. Any confidentiality,
 privilege or copyright is not waived or lost because this email has been sent to you by mistake.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com