Re: Poor RBD performance as LIO iSCSI target

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Running into weird issues here as well in a test environment. I don't have a solution either but perhaps we can find some things in common..

Setup in a nutshell:
- Ceph cluster: Ubuntu 14.04, Kernel 3.16.7, Ceph 0.87-1 (OSDs with separate public/cluster network in 10 Gbps)
- iSCSI Proxy node (targetcli/LIO): Ubuntu 14.04, Kernel 3.16.7, Ceph 0.87-1 (10 Gbps)
- Client node: Ubuntu 12.04, Kernel 3.11 (10 Gbps)

Relevant cluster config: Writeback cache tiering with NVME PCI-E cards (2 replica) in front of a erasure coded pool (k=3,m=2) backed by spindles.

I'm following the instructions here: http://www.hastexo.com/resources/hints-and-kinks/turning-ceph-rbd-images-san-storage-devices
No issues with creating and mapping a 100GB RBD image and then creating the target.

I'm interested in finding out the overhead/performance impact of re-exporting through iSCSI so the idea is to run benchmarks.
Here's a fio test I'm trying to run on the client node on the mounted iscsi device:
fio --name=writefile --size=100G --filesize=100G --filename=/dev/sdu --bs=1M --nrfiles=1 --direct=1 --sync=0 --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 --iodepth=200 --ioengine=libaio

The benchmark will eventually hang towards the end of the test for some long seconds before completing.
On the proxy node, the kernel complains with iscsi portal login timeout: http://pastebin.com/Q49UnTPr and I also see irqbalance errors in syslog: http://pastebin.com/AiRTWDwR

Doing the same test on the machines directly (raw, rbd, on the osd filesystem) doesn't yield any issues.

I've tried a couple things to see if I could get things to work...
- Set irqbalance --hintpolicy=ignore (http://sourceforge.net/p/e1000/bugs/394/ & https://bugs.launchpad.net/ubuntu/+source/irqbalance/+bug/1321425)
- Changed size on cache pool to 1 (for the sake of testing, improved performance but still hangs)
- Set crush tunables to legacy (and back to optimal)
- Various package and kernel versions and putting the proxy node on Ubuntu precise
- Formatting and mounting the iscsi block device and running the test on the formatted filesystem

I don't think it's related .. but I don't remember running into issues before I've swapped out SSDs for the NVME cards for the cache pool.
I don't have time *right now* but I definitely want to test if I am able to reproduce the issue on the SSDs..

Let me know if this gives you any ideas, I'm all ears.
--
David Moreau Simard

> On Oct 28, 2014, at 4:07 PM, Christopher Spearman <neromaverick@xxxxxxxxx> wrote:
> 
> Sage:
> 
> That'd be my assumption, performance looked pretty fantastic over loop until it started being used it heavily
> 
> Mike:
> 
> The configs you asked for are at the end of this message I've subtracted & changed some info, iqn/wwn/portal, for security purposes. The raw & loop target configs are all in one since I'm running both types of configs currently. I also included the running config (ls /) of targetcli for anyone interested in what it looks like from the console.
> 
> The tool I used was dd, I ran through various options using dd but didn't really see much difference. The one on top is my go to command for my first test
> 
> time dd if=/dev/zero of=test bs=32M count=32 oflag=direct,sync
> time dd if=/dev/zero of=test bs=32M count=128 oflag=direct,sync
> time dd if=/dev/zero of=test bs=8M count=512 oflag=direct,sync
> time dd if=/dev/zero of=test bs=4M count=1024 oflag=direct,sync
> 
> 
> ---ls / from current targetcli (no mounted ext4 -> image file config)---
> 
> /iscsi> ls /
> o- / ......................................................................................................................... [...]
>   o- backstores .............................................................................................................. [...]
>   | o- block .................................................................................................. [Storage Objects: 2]
>   | | o- ceph_lun0 ...................................................................... [/dev/loop0 (2.0TiB) write-thru activated]
>   | | o- ceph_noloop00 .............................................. [/dev/rbd/vmiscsi/noloop00 (1.0TiB) write-thru activated]
>   | o- fileio ................................................................................................. [Storage Objects: 0]
>   | o- pscsi .................................................................................................. [Storage Objects: 0]
>   | o- ramdisk ................................................................................................ [Storage Objects: 0]
>   o- iscsi ............................................................................................................ [Targets: 2]
>   | o- iqn.gateway2_01 ..................................................... [TPGs: 1]
>   | | o- tpg1 ............................................................................................... [no-gen-acls, no-auth]
>   | |   o- acls .......................................................................................................... [ACLs: 2]
>   | |   | o- iqn.esxhost01 ............................................................ [Mapped LUNs: 1]
>   | |   | | o- mapped_lun0 ......................................................................... [lun0 block/ceph_noloop00 (rw)]
>   | |   | o- iqn.esxhost02 ....................................................... [Mapped LUNs: 1]
>   | |   |   o- mapped_lun0 ......................................................................... [lun0 block/ceph_noloop00 (rw)]
>   | |   o- luns .......................................................................................................... [LUNs: 1]
>   | |   | o- lun0 ........................................................... [block/ceph_noloop00 (/dev/rbd/vmiscsi/noloop00)]
>   | |   o- portals .................................................................................................... [Portals: 1]
>   | |     o- xxx.xxx.xxx.xxx:3260 ............................................................................................... [OK]
>   | o- iqn.gateway2_02 ..................................................... [TPGs: 1]
>   |   o- tpg1 ............................................................................................... [no-gen-acls, no-auth]
>   |     o- acls .......................................................................................................... [ACLs: 2]
>   |     | o- iqn.esxhost01 ............................................................ [Mapped LUNs: 1]
>   |     | | o- mapped_lun0 ............................................................................. [lun0 block/ceph_lun0 (rw)]
>   |     | o- iqn.esxhost02 ............................................................ [Mapped LUNs: 1]
>   |     | | o- mapped_lun0 ............................................................................. [lun0 block/ceph_lun0 (rw)]
>   |     o- luns .......................................................................................................... [LUNs: 1]
>   |     | o- lun0 ................................................................................... [block/ceph_lun0 (/dev/loop0)]
>   |     o- portals .................................................................................................... [Portals: 1]
>   |       o- xxx.xxx.xxx.xxx:3260 ............................................................................................... [OK]
>   o- loopback ......................................................................................................... [Targets: 0]
> 
> ---saveconfig.json for mounted ext4 config---
> 
> {
>   "fabric_modules": [], 
>   "storage_objects": [
>     {
>       "attributes": {
>         "block_size": 512, 
>         "emulate_dpo": 0, 
>         "emulate_fua_read": 0, 
>         "emulate_fua_write": 1, 
>         "emulate_model_alias": 1, 
>         "emulate_rest_reord": 0, 
>         "emulate_tas": 1, 
>         "emulate_tpu": 0, 
>         "emulate_tpws": 0, 
>         "emulate_ua_intlck_ctrl": 0, 
>         "emulate_write_cache": 1, 
>         "enforce_pr_isids": 1, 
>         "fabric_max_sectors": 8192, 
>         "is_nonrot": 0, 
>         "max_unmap_block_desc_count": 1, 
>         "max_unmap_lba_count": 8192, 
>         "max_write_same_len": 4096, 
>         "optimal_sectors": 8192, 
>         "queue_depth": 128, 
>         "unmap_granularity": 1, 
>         "unmap_granularity_alignment": 0
>       }, 
>       "dev": "/mnt/ceph_perf_test/mounted_rbd_img_test.img", 
>       "name": "mounted_rbd_img_test", 
>       "plugin": "fileio", 
>       "size": 8589934592, 
>       "write_back": true, 
>       "wwn": "xxxx-xxxx-xxxx-xxxx"
>     }
>   ], 
>   "targets": [
>     {
>       "fabric": "iscsi", 
>       "tpgs": [
>         {
>           "attributes": {
>             "authentication": 0, 
>             "cache_dynamic_acls": 0, 
>             "default_cmdsn_depth": 16, 
>             "demo_mode_write_protect": 1, 
>             "generate_node_acls": 0, 
>             "login_timeout": 15, 
>             "netif_timeout": 2, 
>             "prod_mode_write_protect": 0
>           }, 
>           "enable": true, 
>           "luns": [
>             {
>               "index": 0, 
>               "storage_object": "/backstores/fileio/mounted_rbd_img_test"
>             } 
>           ], 
>           "node_acls": [
>             {
>               "attributes": {
>                 "dataout_timeout": 3, 
>                 "dataout_timeout_retries": 5, 
>                 "default_erl": 0, 
>                 "nopin_response_timeout": 30, 
>                 "nopin_timeout": 15, 
>                 "random_datain_pdu_offsets": 0, 
>                 "random_datain_seq_offsets": 0, 
>                 "random_r2t_offsets": 0
>               }, 
>               "mapped_luns": [
>                 {
>                   "index": 0, 
>                   "tpg_lun": 0, 
>                   "write_protect": false
>                 } 
>               ], 
>               "node_wwn": "iqn.centoshost01"
>             } 
>           ], 
>           "parameters": {
>             "AuthMethod": "CHAP,None", 
>             "DataDigest": "CRC32C,None", 
>             "DataPDUInOrder": "Yes", 
>             "DataSequenceInOrder": "Yes", 
>             "DefaultTime2Retain": "20", 
>             "DefaultTime2Wait": "2", 
>             "ErrorRecoveryLevel": "0", 
>             "FirstBurstLength": "65536", 
>             "HeaderDigest": "CRC32C,None", 
>             "IFMarkInt": "2048~65535", 
>             "IFMarker": "No", 
>             "ImmediateData": "Yes", 
>             "InitialR2T": "Yes", 
>             "MaxBurstLength": "262144", 
>             "MaxConnections": "1", 
>             "MaxOutstandingR2T": "1", 
>             "MaxRecvDataSegmentLength": "8192", 
>             "MaxXmitDataSegmentLength": "262144", 
>             "OFMarkInt": "2048~65535", 
>             "OFMarker": "No", 
>             "TargetAlias": "LIO Target"
>           }, 
>           "portals": [
>             {
>               "ip_address": "xxx.xxx.xxx.xxx", 
>               "iser": false, 
>               "port": 3260
>             }
>           ], 
>           "tag": 1
>         }
>       ], 
>       "wwn": "iqn.gateway1_01"
>     }, 
>   ]
> }
> 
> ---saveconfig.json for raw & loop targets---
> 
> {
>   "fabric_modules": [], 
>   "storage_objects": [
>     {
>       "attributes": {
>         "block_size": 512, 
>         "emulate_dpo": 0, 
>         "emulate_fua_read": 0, 
>         "emulate_fua_write": 1, 
>         "emulate_model_alias": 1, 
>         "emulate_rest_reord": 0, 
>         "emulate_tas": 1, 
>         "emulate_tpu": 0, 
>         "emulate_tpws": 0, 
>         "emulate_ua_intlck_ctrl": 0, 
>         "emulate_write_cache": 0, 
>         "enforce_pr_isids": 1, 
>         "fabric_max_sectors": 8192, 
>         "is_nonrot": 0, 
>         "max_unmap_block_desc_count": 0, 
>         "max_unmap_lba_count": 0, 
>         "max_write_same_len": 65535, 
>         "optimal_sectors": 8192, 
>         "queue_depth": 128, 
>         "unmap_granularity": 0, 
>         "unmap_granularity_alignment": 0
>       }, 
>       "dev": "/dev/rbd/vmiscsi/noloop00", 
>       "name": "ceph_noloop00", 
>       "plugin": "block", 
>       "readonly": false, 
>       "write_back": false, 
>       "wwn": "xxxx-xxxx-xxxx"
>     }, 
>     {
>       "attributes": {
>         "block_size": 512, 
>         "emulate_dpo": 0, 
>         "emulate_fua_read": 0, 
>         "emulate_fua_write": 1, 
>         "emulate_model_alias": 1, 
>         "emulate_rest_reord": 0, 
>         "emulate_tas": 1, 
>         "emulate_tpu": 0, 
>         "emulate_tpws": 0, 
>         "emulate_ua_intlck_ctrl": 0, 
>         "emulate_write_cache": 0, 
>         "enforce_pr_isids": 1, 
>         "fabric_max_sectors": 8192, 
>         "is_nonrot": 0, 
>         "max_unmap_block_desc_count": 0, 
>         "max_unmap_lba_count": 0, 
>         "max_write_same_len": 65535, 
>         "optimal_sectors": 8192, 
>         "queue_depth": 128, 
>         "unmap_granularity": 0, 
>         "unmap_granularity_alignment": 0
>       }, 
>       "dev": "/dev/loop0", 
>       "name": "ceph_lun0", 
>       "plugin": "block", 
>       "readonly": false, 
>       "write_back": false, 
>       "wwn": "yyyy-yyyy-yyyy"
>     }, 
>   ], 
>   "targets": [
>     {
>       "fabric": "iscsi", 
>       "tpgs": [
>         {
>           "attributes": {
>             "authentication": 0, 
>             "cache_dynamic_acls": 0, 
>             "default_cmdsn_depth": 16, 
>             "demo_mode_write_protect": 1, 
>             "generate_node_acls": 0, 
>             "login_timeout": 15, 
>             "netif_timeout": 2, 
>             "prod_mode_write_protect": 0
>           }, 
>           "enable": true, 
>           "luns": [
>             {
>               "index": 0, 
>               "storage_object": "/backstores/block/ceph_noloop00"
>             }
>           ], 
>           "node_acls": [
>             {
>               "attributes": {
>                 "dataout_timeout": 3, 
>                 "dataout_timeout_retries": 5, 
>                 "default_erl": 0, 
>                 "nopin_response_timeout": 30, 
>                 "nopin_timeout": 15, 
>                 "random_datain_pdu_offsets": 0, 
>                 "random_datain_seq_offsets": 0, 
>                 "random_r2t_offsets": 0
>               }, 
>               "mapped_luns": [
>                 {
>                   "index": 0, 
>                   "tpg_lun": 0, 
>                   "write_protect": false
>                 }
>               ], 
>               "node_wwn": "iqn.esxhost01"
>             }, 
>             {
>               "attributes": {
>                 "dataout_timeout": 3, 
>                 "dataout_timeout_retries": 5, 
>                 "default_erl": 0, 
>                 "nopin_response_timeout": 30, 
>                 "nopin_timeout": 15, 
>                 "random_datain_pdu_offsets": 0, 
>                 "random_datain_seq_offsets": 0, 
>                 "random_r2t_offsets": 0
>               }, 
>               "mapped_luns": [
>                 {
>                   "index": 0, 
>                   "tpg_lun": 0, 
>                   "write_protect": false
>                 }
>               ], 
>               "node_wwn": "iqn.esxhost02"
>             }
>           ], 
>           "parameters": {
>             "AuthMethod": "CHAP,None", 
>             "DataDigest": "CRC32C,None", 
>             "DataPDUInOrder": "Yes", 
>             "DataSequenceInOrder": "Yes", 
>             "DefaultTime2Retain": "20", 
>             "DefaultTime2Wait": "2", 
>             "ErrorRecoveryLevel": "0", 
>             "FirstBurstLength": "65536", 
>             "HeaderDigest": "CRC32C,None", 
>             "IFMarkInt": "2048~65535", 
>             "IFMarker": "No", 
>             "ImmediateData": "Yes", 
>             "InitialR2T": "Yes", 
>             "MaxBurstLength": "262144", 
>             "MaxConnections": "1", 
>             "MaxOutstandingR2T": "1", 
>             "MaxRecvDataSegmentLength": "8192", 
>             "MaxXmitDataSegmentLength": "262144", 
>             "OFMarkInt": "2048~65535", 
>             "OFMarker": "No", 
>             "TargetAlias": "LIO Target"
>           }, 
>           "portals": [
>             {
>               "ip_address": "xxx.xxx.xxx.xxx", 
>               "iser": false, 
>               "port": 3260
>             }
>           ], 
>           "tag": 1
>         }
>       ], 
>       "wwn": "iqn.gateway2_01"
>     }, 
>     {
>       "fabric": "iscsi", 
>       "tpgs": [
>         {
>           "attributes": {
>             "authentication": 0, 
>             "cache_dynamic_acls": 0, 
>             "default_cmdsn_depth": 16, 
>             "demo_mode_write_protect": 1, 
>             "generate_node_acls": 0, 
>             "login_timeout": 15, 
>             "netif_timeout": 2, 
>             "prod_mode_write_protect": 0
>           }, 
>           "enable": true, 
>           "luns": [
>             {
>               "index": 0, 
>               "storage_object": "/backstores/block/ceph_lun0"
>             }
>           ], 
>           "node_acls": [
>             {
>               "attributes": {
>                 "dataout_timeout": 3, 
>                 "dataout_timeout_retries": 5, 
>                 "default_erl": 0, 
>                 "nopin_response_timeout": 30, 
>                 "nopin_timeout": 15, 
>                 "random_datain_pdu_offsets": 0, 
>                 "random_datain_seq_offsets": 0, 
>                 "random_r2t_offsets": 0
>               }, 
>               "mapped_luns": [ 
>                 {
>                   "index": 0, 
>                   "tpg_lun": 0, 
>                   "write_protect": false
>                 }
>               ], 
>               "node_wwn": "iqn.esxhost01"
>             }, 
>             {
>               "attributes": {
>                 "dataout_timeout": 3, 
>                 "dataout_timeout_retries": 5, 
>                 "default_erl": 0, 
>                 "nopin_response_timeout": 30, 
>                 "nopin_timeout": 15, 
>                 "random_datain_pdu_offsets": 0, 
>                 "random_datain_seq_offsets": 0, 
>                 "random_r2t_offsets": 0
>               }, 
>               "mapped_luns": [
>                 {
>                   "index": 0, 
>                   "tpg_lun": 0, 
>                   "write_protect": false
>                 }
>               ], 
>               "node_wwn": "iqn.esxhost02"
>             }, 
>           ], 
>           "parameters": {
>             "AuthMethod": "CHAP,None", 
>             "DataDigest": "CRC32C,None", 
>             "DataPDUInOrder": "Yes", 
>             "DataSequenceInOrder": "Yes", 
>             "DefaultTime2Retain": "20", 
>             "DefaultTime2Wait": "2", 
>             "ErrorRecoveryLevel": "0", 
>             "FirstBurstLength": "65536", 
>             "HeaderDigest": "CRC32C,None", 
>             "IFMarkInt": "2048~65535", 
>             "IFMarker": "No", 
>             "ImmediateData": "Yes", 
>             "InitialR2T": "Yes", 
>             "MaxBurstLength": "262144", 
>             "MaxConnections": "1", 
>             "MaxOutstandingR2T": "1", 
>             "MaxRecvDataSegmentLength": "8192", 
>             "MaxXmitDataSegmentLength": "262144", 
>             "OFMarkInt": "2048~65535", 
>             "OFMarker": "No", 
>             "TargetAlias": "LIO Target"
>           }, 
>           "portals": [
>             {
>               "ip_address": "xxx.xxx.xxx.xxx", 
>               "iser": false, 
>               "port": 3260
>             }
>           ], 
>           "tag": 1
>         }
>       ], 
>       "wwn": "iqn.gateway2_02"
>     }
>   ]
> }
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux