Laurence, Since the summary may be helpful to others, I'm just going to send it to the list. I've been able to reproduce the D state problem on both Infiniband and RoCE, but it is much easier to reproduce on RoCE due to another bug and doesn't require being at the server to yank the cable (remote power control of a switch may work as well). The bug seems to be triggered by an abrupt and unexpected break in communications Common config between both Infiniband and RoCE: ==== * Linux kernel 4.9 (using only inbox drivers, no OFED) * Target and initiator both configured on the same subnet * 100 GB ram disk exported by iser [1] * Iser volume imported on client and the whole block device formatted ext4. * FIO run on iser volume on the client [2] * Anything not mentioned in this document should be default (it is a pretty simple config) Infiniband specific config: ==== * Any IB cards should work (my config has ConnectX-3, but has also been seen on Connect-IB in our environment) * Back to back (my config) or connected to a switch * OpenSM running on the target (my config), or on a separate host (not sure how cutting power to the switch may impact triggering the bug, I believe it will still trigger ok) * While running the fio job, pull the cable on the initiator side. After about 120 seconds the fio job will fail and the iscsi processes should be in D state on the target. RoCE specific config: ==== * Only tested with ConnectX-4-LX cards (I don't know if others will trigger the problem, pulling the cable like in the Infiniband section, may also trigger the bug if it doesn't trigger automatically) * Hosts must be connected by a switch or a Linux bridge that doesn't have RoCE offload. I was able to trigger the bugs with a back to back connection if the target clamps the speed to 10 Gb [3]. * Running the fio job should be enough to trigger the RoCE card to unexpectedly drop the RDMA connection and that should then cause the target iscsci processes to go into D state. For either the Infiniband or RoCE setup, the bug can be triggered with only two hosts connected back to back. If something is still not clear, please let me know. [1] /etc/saveconfig.json ```json { "fabric_modules": [], "storage_objects": [ { "attributes": { "block_size": 512, "emulate_3pc": 1, "emulate_caw": 1, "emulate_dpo": 0, "emulate_fua_read": 0, "emulate_fua_write": 1, "emulate_model_alias": 1, "emulate_rest_reord": 0, "emulate_tas": 1, "emulate_tpu": 0, "emulate_tpws": 0, "emulate_ua_intlck_ctrl": 0, "emulate_write_cache": 0, "enforce_pr_isids": 1, "force_pr_aptpl": 0, "is_nonrot": 1, "max_unmap_block_desc_count": 0, "max_unmap_lba_count": 0, "max_write_same_len": 0, "optimal_sectors": 4294967288, "pi_prot_format": 0, "pi_prot_type": 0, "queue_depth": 128, "unmap_granularity": 0, "unmap_granularity_alignment": 0 }, "name": "test1", "plugin": "ramdisk", "size": 107374182400, "wwn": "7486ed41-585e-400f-8799-ac605485b221" } ], "targets": [ { "fabric": "iscsi", "tpgs": [ { "attributes": { "authentication": 0, "cache_dynamic_acls": 1, "default_cmdsn_depth": 64, "default_erl": 0, "demo_mode_discovery": 1, "demo_mode_write_protect": 0, "generate_node_acls": 1, "login_timeout": 15, "netif_timeout": 2, "prod_mode_write_protect": 0, "t10_pi": 0 }, "enable": true, "luns": [ { "index": 0, "storage_object": "/backstores/ramdisk/test1" } ], "node_acls": [], "parameters": { "AuthMethod": "CHAP,None", "DataDigest": "CRC32C,None", "DataPDUInOrder": "Yes", "DataSequenceInOrder": "Yes", "DefaultTime2Retain": "20", "DefaultTime2Wait": "2", "ErrorRecoveryLevel": "0", "FirstBurstLength": "65536", "HeaderDigest": "CRC32C,None", "IFMarkInt": "Reject", "IFMarker": "No", "ImmediateData": "Yes", "InitialR2T": "Yes", "MaxBurstLength": "262144", "MaxConnections": "1", "MaxOutstandingR2T": "1", "MaxRecvDataSegmentLength": "8192", "MaxXmitDataSegmentLength": "262144", "OFMarkInt": "Reject", "OFMarker": "No", "TargetAlias": "LIO Target" }, "portals": [ { "ip_address": "0.0.0.0", "iser": true, "port": 3260 } ], "tag": 1 } ], "wwn": "iqn.2016-12.com.betterservers" } ] } ``` [2] echo "3" > /proc/sys/vm/drop_caches; fio --rw=read --bs=4K --size=1G --numjobs=40 --name=worker.matt --group_reporting [3] ethtool -s eth3 speed 10000 advertise 0x80000 ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html