Re: EC Pools w/ RBD - IOPs

"Anthony Brandelli (abrandel)" <abrandel@xxxxxxxxx> · Tue, 18 Feb 2020 21:19:48 +0000

Added a fifth OSD node. Cluster now looks something like:

3x mons (2x 10G, 2x E5-2690 V2, 256GB RAM)
5x OSD (2x 10G, 2x e5-2690 V2, 256GB-385GB RAM, 12x Samsung SM1625 SSDs)

Random write latency went up to 16ms average with the addition of the fifth node, and k=3,m=2.

What kind of latencies are people seeing in their EC clusters?

From: "Anthony Brandelli (abrandel)" <abrandel@xxxxxxxxx>
Date: Thursday, February 13, 2020 at 10:17 AM
To: Martin Verges <martin.verges@xxxxxxxx>
Cc: "ceph-users@xxxxxxx" <ceph-users@xxxxxxx>
Subject: Re:  EC Pools w/ RBD - IOPs

I should mention this is solely meant as a test cluster, and unfortunately I only have four OSD nodes in it. I guess I’ll go see if I can dig up another node so I can better mirror what might eventually go to production.

I would imagine that latency is only going to increase as we increase k though, no?

From: Martin Verges <martin.verges@xxxxxxxx>
Date: Thursday, February 13, 2020 at 10:10 AM
To: "Anthony Brandelli (abrandel)" <abrandel@xxxxxxxxx>
Cc: "ceph-users@xxxxxxx" <ceph-users@xxxxxxx>
Subject: Re:  EC Pools w/ RBD - IOPs

Hello,

please do not even think about using an EC pool (k=2, m=1). See other posts here, just don't.

EC works quite well and we have a lot of users with EC based VMs often with proxmox (rbd) oder vmware (iscsi) hypervisors.
Performance depends on the hardware and is definitely slower than replica, but cost efficient and more then ok on most workloads. If you split generic VMs and Databases (or similar workloads), you can save a lot of money with EC.

--
Martin Verges
Managing director
Hint: Secure one of the last slots in the upcoming 4-day Ceph Intensive Training at https://croit.io/training/4-days-ceph-in-depth-training.

Mobile: +49 174 9335695
E-Mail: martin.verges@xxxxxxxx<mailto:martin.verges@xxxxxxxx>
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx

Am Do., 13. Feb. 2020 um 17:52 Uhr schrieb Anthony Brandelli (abrandel) <abrandel@xxxxxxxxx<mailto:abrandel@xxxxxxxxx>>:
Hi Ceph Community,

Wondering what experiences good/bad you have with EC pools for iops intensive workloads (IE: 4Kish random IO from things like VMWare ESXi). I realize that EC pools are a tradeoff between more usable capacity, and having larger latency/lower iops, but in my testing the tradeoff for small IO seems to be much worse than I had anticipated.

On an all flash 3x replicated pool we’re seeing 45k random read, and 35k random write iops testing with fio on a client living on an iSCSI LUN presented to an ESXi host. Average latencies for these ops are 4.2ms, and 5.5ms, which is respectable at an io depth of 32.

Take this same setup with an EC pool (k=2, m=1, tested with both ISA and jerasure, ISA does give better performance for our use case) and we see 30k random read, and 16k random write iops. Random reads see 6.5ms average, while random writes suffer with 12ms average.

Are others using EC pools seeing similar hits to random writes with small IOs? Any way to improve this?

Thanks,
Anthony
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx