Terrible iSCSI tgt RBD performance

"Robin H. Johnson" <robbat2@xxxxxxxxxx> · Tue, 17 Mar 2015 18:25:07 +0000

I'm trying to get better performance out of exporting RBD volumes via
tgt for iSCSI consumers...

By terrible, I'm getting <5MB/sec reads, <50IOPS. I'm pretty sure neither RBD
or iSCSI themselves are the problems; as the individually perform well.

iSCSI to RAM-backed: >60MB/sec, >500IOPS
iSCSI to SSD-backed: >50MB/sec, >300IOPS
iSCSI to RBD-backed: <5MB/sec, <50IOPS

Cluster:
4 nodes (ceph1..4):
- Supermicro 6027TR-D70RF+ (2U twin systems)
  - Chassis A: ceph1, ceph2
  - Chassis B: ceph3, ceph4
- 2x E5-2650
- 256GB RAM
- 4x 4TB Seagate ST4000NM0023 SAS, dedicated to Ceph
- 2x 512GB Samsung 840 PRO
  - MD RAID1
  - LVM
  - LV: OS on 'root', 20GiB
  - LV: Ceph Journals, 8GB, one per Ceph disk
- 2x Bonded 1GbE network
- 10GbE network:
  - port1: to switch
  - port2: direct-connect pairs: ceph1/3 ceph2/4 (vertical between chassis)
- All 4 nodes run OSPF
  - ceph1/2; ceph3/4: ~9.8Gbit bandwidth confirmed
  - ceph1/3; ceph2/4: ~18.2Gbit bandwidth confirmed
- The nodes also co-house VMs with Ganeti, backed onto the SSDs w/ DRBD;
- S3 is the main Ceph use-case, and it works well from the VMs.

Direct performance on the nodes is reasonable good, but it would be nice if the
random performance were better.

# rbd bench-write XXXXX
bench-write  io_size 4096 io_threads 16 bytes 1073741824 pattern seq
...
elapsed:    36  ops:   246603  ops/sec:  6681.20  bytes/sec: 29090920.91
# rbd bench-write XXXXX
bench-write  io_size 4096 io_threads 16 bytes 1073741824 pattern seq
...
elapsed:    48  ops:   246585  ops/sec:  5070.70  bytes/sec: 22080207.55
# rbd bench-write test.libraries.coop --io-pattern rand
bench-write  io_size 4096 io_threads 16 bytes 1073741824 pattern rand
...
elapsed:   324  ops:   246178  ops/sec:   757.74  bytes/sec: 3305000.99
# rbd bench-write test.libraries.coop --io-threads 16 --io-pattern rand --io-size 32768
bench-write  io_size 32768 io_threads 16 bytes 1073741824 pattern rand
...
elapsed:    86  ops:    30141  ops/sec:   347.39  bytes/sec: 12375512.34

Yes I know the data below seems small; I have another older cluster of data
that I still have to merge to this newer hardware.

# ceph -w
    cluster 401a58ef-5075-49ec-9615-1c2973624252
     health HEALTH_WARN 6 pgs stuck unclean; recovery 8472/241829 objects degraded (3.503%); mds cluster is degraded; mds ceph1 is laggy
     monmap e3: 3 mons at {ceph1=10.77.10.41:6789/0,ceph2=10.77.10.42:6789/0,ceph4=10.77.10.44:6789/0}, election epoch 11486, quorum 0,1,2 ceph1,ceph2,ceph4
     mdsmap e1496661: 1/1/1 up {0=ceph1=up:replay(laggy or crashed)}
     osdmap e4323895: 16 osds: 16 up, 16 in
      pgmap v14695205: 481 pgs, 17 pools, 186 GB data, 60761 objects
            1215 GB used, 58356 GB / 59571 GB avail
            8472/241829 objects degraded (3.503%)
                   6 active
                 475 active+clean
  client io 67503 B/s rd, 7297 B/s wr, 13 op/s

TGT setups:
Target 1: rbd.XXXXXXXXXXX
    System information:
        Driver: iscsi
        State: ready
    I_T nexus information:
        I_T nexus: 11
            Initiator: iqn.1993-08.org.debian:01:6b14da6a48b6 alias: XXXXXXXXXXXXXXXX
            Connection: 0
                IP Address: 10.77.110.6
    LUN information:
        LUN: 0
            Type: controller
            SCSI ID: IET     00010000
            SCSI SN: beaf10
            Size: 0 MB, Block size: 1
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            SWP: No
            Thin-provisioning: No
            Backing store type: null
            Backing store path: None
            Backing store flags: 
        LUN: 1
            Type: disk
            SCSI ID: IET     00010001
            SCSI SN: beaf11
            Size: 161061 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            SWP: No
            Thin-provisioning: No
            Backing store type: rbd
            Backing store path: XXXXXXXXXXXXXXXXXXXXXXx
            Backing store flags: 
    Account information:
    ACL information:
        XXXXXXXXXXXXXXXXXXXXXXXXXXXxx

# tgtadm --lld iscsi --mode target --op show --tid 1
MaxRecvDataSegmentLength=8192
HeaderDigest=None
DataDigest=None
InitialR2T=Yes
MaxOutstandingR2T=1
ImmediateData=Yes
FirstBurstLength=65536
MaxBurstLength=262144
DataPDUInOrder=Yes
DataSequenceInOrder=Yes
ErrorRecoveryLevel=0
IFMarker=No
OFMarker=No
DefaultTime2Wait=2
DefaultTime2Retain=20
OFMarkInt=Reject
IFMarkInt=Reject
MaxConnections=1
RDMAExtensions=Yes
TargetRecvDataSegmentLength=262144
InitiatorRecvDataSegmentLength=262144
MaxOutstandingUnexpectedPDUs=0
MaxXmitDataSegmentLength=8192
MaxQueueCmd=128

-- 
Robin Hugh Johnson
Gentoo Linux: Developer, Infrastructure Lead
E-Mail     : robbat2@xxxxxxxxxx
GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com