Re: VMware + Ceph using NFS sync/async ?

Adrian Saul <Adrian.Saul@xxxxxxxxxxxxxxxxx> · Thu, 17 Aug 2017 00:58:02 +0000

> I'd be interested in details of this small versus large bit.

The smaller shares is just simply to distribute the workload over more RBDs so the bottleneck doesn’t become the RBD device. The size itself doesn’t particularly matter but just the idea to distribute VMs across many shares rather than a few large datastores.

We originally started with 10TB shares, just because we had the space - but we found performance was running out before capacity did.  But it's been apparent that the limitation appears to be at the RBD level, particularly with writes.  So under heavy usage with say VMWare snapshot backups VMs gets impacted by higher latency to the point that some VMs become unresponsive for small periods.  The ceph cluster itself has plenty of performance available and handles far higher workload periods, but individual RBD devices just seem to hit the wall.

For example, one of our shares will sit there all day happily doing 3-400 IOPS read at very low latencies.  During the backup period we get heavier writes as snapshots are created and cleaned up.   That increased write activity pushes the RBD to 100% busy and read latencies go up from 1-2ms to 20-30ms, even though the number of reads doesn’t change that much.   The devices though can handle more, I can see periods of up to 1800 IOPS read and 800 write.

There is probably more tuning that can be applied at the XFS/NFS level, but for the moment that’s the direction we are taking - creating more shares.

>
> Would you say that the IOPS starvation is more an issue of the large
> filesystem than the underlying Ceph/RBD?

As above - I think its more to do with an IOPS limitation at the RBD device level - likely due to sync write latency limiting the number of effective IOs.  That might be XFS as well but I have not had the chance to dial that in more.

> With a cache-tier in place I'd expect all hot FS objects (inodes, etc) to be
> there and thus be as fast as it gets from a Ceph perspective.

Yeah - the cache teir takes a fair bit of the heat and improves the response considerably for the SATA environments - it makes a significant difference.  The SSD only pool images behave in a similar way but operate to a much higher performance level before they start showing issues.

> OTOH lots of competing accesses to same journal, inodes would be a
> limitation inherent to the FS.

Its likely there is tuning there to improve the XFS performance, but from the stats of the RBD device they are showing the latencies going up, there might be more impact further up the stack, but the underlying device shows the change in performance.

>
> Christian
>
> >
> > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
> > Of Osama Hasebou
> > Sent: Wednesday, 16 August 2017 10:34 PM
> > To: nick@xxxxxxxxxx
> > Cc: ceph-users <ceph-users@xxxxxxxxxxxxxx>
> > Subject: Re:  VMware + Ceph using NFS sync/async ?
> >
> > Hi Nick,
> >
> > Thanks for replying! If Ceph is combined with Openstack then, does that
> mean that actually when openstack writes are happening, it is not fully sync'd
> (as in written to disks) before it starts receiving more data, so acting as async
> ? In that scenario there is a chance for data loss if things go bad, i.e power
> outage or something like that ?
> >
> > As for the slow operations, reading is quite fine when I compare it to a SAN
> storage system connected to VMware. It is writing data, small chunks or big
> ones, that suffer when trying to use the sync option with FIO for
> benchmarking.
> >
> > In that case, I wonder, is no one using CEPH with VMware in a production
> environment ?
> >
> > Cheers.
> >
> > Regards,
> > Ossi
> >
> >
> >
> > Hi Osama,
> >
> > This is a known problem with many software defined storage stacks, but
> potentially slightly worse with Ceph due to extra overheads. Sync writes
> have to wait until all copies of the data are written to disk by the OSD and
> acknowledged back to the client. The extra network hops for replication and
> NFS gateways add significant latency which impacts the time it takes to carry
> out small writes. The Ceph code also takes time to process each IO request.
> >
> > What particular operations are you finding slow? Storage vmotions are just
> bad, and I don’t think there is much that can be done about them as they are
> split into lots of 64kb IO’s.
> >
> > One thing you can try is to force the CPU’s on your OSD nodes to run at C1
> cstate and force their minimum frequency to 100%. This can have quite a
> large impact on latency. Also you don’t specify your network, but 10G is a
> must.
> >
> > Nick
> >
> >
> > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
> > Of Osama Hasebou
> > Sent: 14 August 2017 12:27
> > To: ceph-users
> > <ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx>>
> > Subject:  VMware + Ceph using NFS sync/async ?
> >
> > Hi Everyone,
> >
> > We started testing the idea of using Ceph storage with VMware, the idea
> was to provide Ceph storage through open stack to VMware, by creating a
> virtual machine coming from Ceph + Openstack , which acts as an NFS
> gateway, then mount that storage on top of VMware cluster.
> >
> > When mounting the NFS exports using the sync option, we noticed a huge
> degradation in performance which makes it very slow to use it in production,
> the async option makes it much better but then there is the risk of it being
> risky that in case a failure shall happen, some data might be lost in that
> Scenario.
> >
> > Now I understand that some people in the ceph community are using Ceph
> with VMware using NFS gateways, so if you can kindly shed some light on
> your experience, and if you do use it in production purpose, that would be
> great and how did you mitigate the sync/async options and keep write
> performance.
> >
> >
> > Thanks you!!!
> >
> > Regards,
> > Ossi
> >
> >
> > Confidentiality: This email and any attachments are confidential and may be
> subject to copyright, legal or some other professional privilege. They are
> intended solely for the attention and use of the named addressee(s). They
> may only be copied, distributed or disclosed with the consent of the
> copyright owner. If you have received this email by mistake or by breach of
> the confidentiality clause, please notify the sender immediately by return
> email and delete or destroy all copies of the email. Any confidentiality,
> privilege or copyright is not waived or lost because this email has been sent
> to you by mistake.
>
>
>
>
> --
> Christian Balzer        Network/Systems Engineer
> chibi@xxxxxxx   Rakuten Communications
Confidentiality: This email and any attachments are confidential and may be subject to copyright, legal or some other professional privilege. They are intended solely for the attention and use of the named addressee(s). They may only be copied, distributed or disclosed with the consent of the copyright owner. If you have received this email by mistake or by breach of the confidentiality clause, please notify the sender immediately by return email and delete or destroy all copies of the email. Any confidentiality, privilege or copyright is not waived or lost because this email has been sent to you by mistake.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com