Hi Robin, Just a few things to try:- 1. Increase the number of worker threads for tgt (it's a parameter of tgtd, so modify however its being started) 2. Disable librbd caching in ceph.conf 3. Do you see the same performance problems exporting a krbd as a block device via tgt? Nick > -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of > Robin H. Johnson > Sent: 17 March 2015 18:25 > To: ceph-users@xxxxxxxxxxxxxx > Subject: Terrible iSCSI tgt RBD performance > > I'm trying to get better performance out of exporting RBD volumes via tgt for > iSCSI consumers... > > By terrible, I'm getting <5MB/sec reads, <50IOPS. I'm pretty sure neither RBD > or iSCSI themselves are the problems; as the individually perform well. > > iSCSI to RAM-backed: >60MB/sec, >500IOPS iSCSI to SSD-backed: > >50MB/sec, >300IOPS iSCSI to RBD-backed: <5MB/sec, <50IOPS > > Cluster: > 4 nodes (ceph1..4): > - Supermicro 6027TR-D70RF+ (2U twin systems) > - Chassis A: ceph1, ceph2 > - Chassis B: ceph3, ceph4 > - 2x E5-2650 > - 256GB RAM > - 4x 4TB Seagate ST4000NM0023 SAS, dedicated to Ceph > - 2x 512GB Samsung 840 PRO > - MD RAID1 > - LVM > - LV: OS on 'root', 20GiB > - LV: Ceph Journals, 8GB, one per Ceph disk > - 2x Bonded 1GbE network > - 10GbE network: > - port1: to switch > - port2: direct-connect pairs: ceph1/3 ceph2/4 (vertical between chassis) > - All 4 nodes run OSPF > - ceph1/2; ceph3/4: ~9.8Gbit bandwidth confirmed > - ceph1/3; ceph2/4: ~18.2Gbit bandwidth confirmed > - The nodes also co-house VMs with Ganeti, backed onto the SSDs w/ DRBD; > - S3 is the main Ceph use-case, and it works well from the VMs. > > Direct performance on the nodes is reasonable good, but it would be nice if > the random performance were better. > > # rbd bench-write XXXXX > bench-write io_size 4096 io_threads 16 bytes 1073741824 pattern seq ... > elapsed: 36 ops: 246603 ops/sec: 6681.20 bytes/sec: 29090920.91 > # rbd bench-write XXXXX > bench-write io_size 4096 io_threads 16 bytes 1073741824 pattern seq ... > elapsed: 48 ops: 246585 ops/sec: 5070.70 bytes/sec: 22080207.55 > # rbd bench-write test.libraries.coop --io-pattern rand bench-write io_size > 4096 io_threads 16 bytes 1073741824 pattern rand ... > elapsed: 324 ops: 246178 ops/sec: 757.74 bytes/sec: 3305000.99 > # rbd bench-write test.libraries.coop --io-threads 16 --io-pattern rand --io- > size 32768 bench-write io_size 32768 io_threads 16 bytes 1073741824 pattern > rand ... > elapsed: 86 ops: 30141 ops/sec: 347.39 bytes/sec: 12375512.34 > > Yes I know the data below seems small; I have another older cluster of data > that I still have to merge to this newer hardware. > > # ceph -w > cluster 401a58ef-5075-49ec-9615-1c2973624252 > health HEALTH_WARN 6 pgs stuck unclean; recovery 8472/241829 objects > degraded (3.503%); mds cluster is degraded; mds ceph1 is laggy > monmap e3: 3 mons at > {ceph1=10.77.10.41:6789/0,ceph2=10.77.10.42:6789/0,ceph4=10.77.10.44:678 > 9/0}, election epoch 11486, quorum 0,1,2 ceph1,ceph2,ceph4 > mdsmap e1496661: 1/1/1 up {0=ceph1=up:replay(laggy or crashed)} > osdmap e4323895: 16 osds: 16 up, 16 in > pgmap v14695205: 481 pgs, 17 pools, 186 GB data, 60761 objects > 1215 GB used, 58356 GB / 59571 GB avail > 8472/241829 objects degraded (3.503%) > 6 active > 475 active+clean > client io 67503 B/s rd, 7297 B/s wr, 13 op/s > > > TGT setups: > Target 1: rbd.XXXXXXXXXXX > System information: > Driver: iscsi > State: ready > I_T nexus information: > I_T nexus: 11 > Initiator: iqn.1993-08.org.debian:01:6b14da6a48b6 alias: > XXXXXXXXXXXXXXXX > Connection: 0 > IP Address: 10.77.110.6 > LUN information: > LUN: 0 > Type: controller > SCSI ID: IET 00010000 > SCSI SN: beaf10 > Size: 0 MB, Block size: 1 > Online: Yes > Removable media: No > Prevent removal: No > Readonly: No > SWP: No > Thin-provisioning: No > Backing store type: null > Backing store path: None > Backing store flags: > LUN: 1 > Type: disk > SCSI ID: IET 00010001 > SCSI SN: beaf11 > Size: 161061 MB, Block size: 512 > Online: Yes > Removable media: No > Prevent removal: No > Readonly: No > SWP: No > Thin-provisioning: No > Backing store type: rbd > Backing store path: XXXXXXXXXXXXXXXXXXXXXXx > Backing store flags: > Account information: > ACL information: > XXXXXXXXXXXXXXXXXXXXXXXXXXXxx > > # tgtadm --lld iscsi --mode target --op show --tid 1 > MaxRecvDataSegmentLength=8192 > HeaderDigest=None > DataDigest=None > InitialR2T=Yes > MaxOutstandingR2T=1 > ImmediateData=Yes > FirstBurstLength=65536 > MaxBurstLength=262144 > DataPDUInOrder=Yes > DataSequenceInOrder=Yes > ErrorRecoveryLevel=0 > IFMarker=No > OFMarker=No > DefaultTime2Wait=2 > DefaultTime2Retain=20 > OFMarkInt=Reject > IFMarkInt=Reject > MaxConnections=1 > RDMAExtensions=Yes > TargetRecvDataSegmentLength=262144 > InitiatorRecvDataSegmentLength=262144 > MaxOutstandingUnexpectedPDUs=0 > MaxXmitDataSegmentLength=8192 > MaxQueueCmd=128 > > > -- > Robin Hugh Johnson > Gentoo Linux: Developer, Infrastructure Lead > E-Mail : robbat2@xxxxxxxxxx > GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com