Re: Slow Performance - Sequential IO

"Anthony Brandelli (abrandel)" <abrandel@xxxxxxxxx> · Fri, 17 Jan 2020 23:01:09 +0000

Not been able to make any headway on this after some significant effort. 

-Tested all 48 SSDs with FIO directly, all tested with 10% of each other for 4k iops in rand|seq read|write. 

-Disabled all CPU power save. 

-Tested with both rbd cache enabled and disabled on the client. 

-Tested with drive caches enabled and disabled (hdparm)

-Minimal TCP retransmissions under load (<10 for a 2 minute duration). 

-No drops/pause frames noted on upstream switches. 

-CPU load on OSD nodes peaks at 6~. 

-iostat shows a peak of 15ms under read/write workloads, %util peaks at about 10%. 

-Swapped out the RBD client for a bigger box, since the load was peaking at 16. Now a 24 core box, load still peaks at 16. 

-Disabled cephx signatures

-Verified hardware health (nothing in dmesg, nothing in CIMC fault logs, storage controller logs)

-Test multiple SSDs at once to find the controllers iops limit, which is apparently 650k @ 4k. 

Nothing has made a noticeable difference here. I'm pretty baffled as to what would be causing the awful sequential read and write performance, but allowing good random r/w speeds. 

I switched up fio testing methodologies to use more threads, but this didn't seem to help either:

[global]

bs=4k

ioengine=rbd

iodepth=32

size=5g

runtime=120

numjobs=4

group_reporting=1

pool=rbd_af1

rbdname=image1

[seq-read]

rw=read

stonewall

[rand-read]

rw=randread

stonewall

[seq-write]

rw=write

stonewall

[rand-write]

rw=randwrite

stonewall

Any pointers are appreciated at this point. I've been following other threads on the mailing list, and looked at the archives, related to RBD performance but none of the solutions that worked for others seem to have helped this setup. 

Thanks,

Anthony

From: Anthony Brandelli (abrandel) <abrandel@xxxxxxxxx>

Sent: Tuesday, January 14, 2020 12:43 AM

To: ceph-users@xxxxxxxxxxxxxx <ceph-users@xxxxxxxxxxxxxx>

Subject: Slow Performance - Sequential IO

I have a newly setup test cluster that is giving some surprising numbers when running fio against an RBD. The end goal here is to see how viable a Ceph based iSCSI SAN of sorts is for VMware
 clusters, which require a bunch of random IO. 

Hardware:
2x E5-2630L v2 (2.4GHz, 6 core)
256GB RAM
2x 10gbps bonded network, Intel X520
LSI 9271-8i, SSDs used for OSDs in JBOD mode
Mons: 2x 1.2TB 10K SAS in RAID1
OSDs: 12x Samsung
MZ6ER800HAGL-00003 800GB SAS SSDs, super cap/power loss protection

Cluster setup:
Three mon nodes, four OSD nodes
Two OSDs per SSD
Replica 3 pool
Ceph 14.2.5

Ceph status:
  cluster:
    id:     e3d93b4a-520c-4d82-a135-97d0bda3e69d
    health: HEALTH_WARN
            application not enabled on 1 pool(s)

  services:
    mon: 3 daemons, quorum mon1,mon2,mon3 (age 6d)
    mgr: mon2(active, since 6d), standbys: mon3, mon1
    osd: 96 osds: 96 up (since 3d), 96 in (since 3d)

  data:
    pools:   1 pools, 3072 pgs
    objects: 857.00k objects, 1.8 TiB
    usage:   432 GiB used, 34 TiB / 35 TiB avail
    pgs:     3072 active+clean

Network between nodes tests at 9.88gbps. Direct testing of the SSDs using a 4K block in fio shows 127k seq read, 86k randm read, 107k seq write, 52k random write. No high CPU load/interface
 saturation is noted when running tests against the rbd. 

When testing with a 4K block size against an RBD on a dedicated metal test host (same specs as other cluster nodes noted above) I get the following (command similar to fio -ioengine=rbd -direct=1
 -name=test -bs=4k -iodepth=32 -rw=XXXX -pool=scbench -runtime=60 -rbdname=datatest):

10k sequential read iops
69k random read iops
13k sequential write iops
22k random write iops

I’m not clear why the random ops, especially read, would be so much quicker compared to the sequential ops.

Any points appreciated.

Thanks,
Anthony

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com