Re: Looking for experience

Ed Kalk <ekalk@xxxxxxxxxx> · Thu, 9 Jan 2020 16:13:19 -0600

It sounds like an I/O bottleneck (either max IOPS or max throughput) in 
the making.

If you are looking for cold storage archival data only, then it may be 
ok.(if it doesn't matter how long it takes to write the data)

If this is production data with any sort of IOPs load or data change 
rate, I'd be concerned.

Too big of spin disks, will get killed on seek times. Too many & too big 
spinners will likely bottleneck the i/O controller. It would be better 
to use more of cheaper nodes to yield way more disks which are smaller. 
(2TB max) (more disks, more i/o controllers, more motherboards = more 
perf) Think "scale out" in # of nodes not "scale up" the individual nodes

-Ed

Software Defined Storage Engineer

On 1/9/2020 3:52 PM, Stefan Priebe - Profihost AG wrote:
As a starting point the current idea is to use something like:

4-6 nodes with 12x 12tb disks each
128G Memory
AMD EPYC 7302P 3GHz, 16C/32T
128GB RAM

Something to discuss is

- EC or go with 3 replicas. We'll use bluestore with compression.
- Do we need something like Intel Optane for WAL / DB or not?

Since we started using ceph we're mostly subscribed to SSDs - so no
knowlege about HDD in place.

Greets,
Stefan
Am 09.01.20 um 16:49 schrieb Stefan Priebe - Profihost AG:
Am 09.01.2020 um 16:10 schrieb Wido den Hollander <wido@xxxxxxxx>:

On 1/9/20 2:27 PM, Stefan Priebe - Profihost AG wrote:
Hi Wido,
Am 09.01.20 um 14:18 schrieb Wido den Hollander:

On 1/9/20 2:07 PM, Daniel Aberger - Profihost AG wrote:
Am 09.01.20 um 13:39 schrieb Janne Johansson:
    I'm currently trying to workout a concept for a ceph cluster which can
    be used as a target for backups which satisfies the following
    requirements:

    - approx. write speed of 40.000 IOP/s and 2500 Mbyte/s

You might need to have a large (at least non-1) number of writers to get
to that sum of operations, as opposed to trying to reach it with one
single stream written from one single client.

We are aiming for about 100 writers.
So if I read it correctly the writes will be 64k each.
may be ;-) see below

That should be doable, but you probably want something like NVMe for DB+WAL.

You might want to tune that larger writes also go into the WAL to speed
up the ingress writes. But you mainly want more spindles then less.
I would like to give a little bit more insight about this and most
probobly some overhead we currently have in those numbers. Those values
come from our old classic raid storage boxes. Those use btrfs + zlib
compression + subvolumes for those backups and we've collected those
numbers from all of them.

The new system should just replicate snapshots from the live ceph.
Hopefully being able to use Erase Coding and compression? ;-)

Compression might work, but only if the data is compressable.

EC usually writes very fast, so that's good. I would recommend a lot of
spindles those. More spindles == more OSDs == more performance.

So instead of using 12TB drives you can consider 6TB or 8TB drives.
Currently we have a lot of 5TB 2.5 drives in place so we could use them.we would like to start with around 4000 Iops and 250 MB per second while using 24 Drive boxes. We could please one or two NVMe PCIe cards in them.

Stefan

Wido

Greets,
Stefan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com