Kubernetes/Ceph block performance

Rhugga Harper <rhugga@xxxxxxxxx> · Fri, 18 May 2018 13:21:52 -0700

We're evaluating persistent block providers for Kubernetes and looking at ceph at the moment.

We aren't seeing performance anywhere near what we expect.

I have a 50-node proof of concept cluster with 40 nodes available for storage and configured with rook/ceph. Each has 10GB nics and 8 x 1TB SSD's. (only 3 drives on each node have been allocated to ceph use)

We are testing with replicated pools of size 1 and 3. I've been doing fio tests in parallel (pod setup to run fio) and it seems to average aggregate bandwidth around 150 MB/sec. 

I'm running the fio tests as follows:

direct=1, fsync=8|16|32|64, readwrite=write, blocksize=4k, numjobs=4|8,  size=10G

Regardless of doing 1 stream or 40, the aggregate bandwidth as reported by "ceph -s" is ~150MB/sec:

I'm creating my pool with pg_num/pgp_num=1024|2048|4096

A baseline dd (100GB file using blocksize 1G) on these SSD's shows them capable of 1.6 GB/s.

I can't seem to find any limitations or bottlenecks on the nodes or the network.

Anyone have any idea where else I can look? 

I'm new to ceph and it just seems like this should be pushing more I/O. I've dug thru a lot of performance tuning sites and have m implemented most of the suggestions.

# ceph -s
  cluster:
    id:     949a8caf-9a9b-4f09-8711-1d5158a65bd8
    health: HEALTH_OK

  services:
    mon: 7 daemons, quorum rook-ceph-mon1,rook-ceph-mon3,rook-ceph-mon0,rook-ceph-mon5,rook-ceph-mon4,rook-ceph-mon2,rook-ceph-mon6
    mgr: rook-ceph-mgr0(active)
    osd: 123 osds: 123 up, 123 in

  data:
    pools:   1 pools, 2048 pgs
    objects: 134k objects, 508 GB
    usage:   1240 GB used, 110 TB / 112 TB avail
    pgs:     2048 active+clean

  io:
    client:   138 MB/s wr, 0 op/s rd, 71163 op/s wr

Thanks for any help,
CC
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com