Re: iscsi_trx going into D state

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Fri, 6 Jan 2017 12:12:58 -0700

Laurence,

Since the summary may be helpful to others, I'm just going to send it
to the list.

I've been able to reproduce the D state problem on both Infiniband and
RoCE, but it is much easier to reproduce on RoCE due to another bug
and doesn't require being at the server to yank the cable (remote
power control of a switch may work as well). The bug seems to be
triggered by an abrupt and unexpected break in communications

Common config between both Infiniband and RoCE:
====
* Linux kernel 4.9 (using only inbox drivers, no OFED)
* Target and initiator both configured on the same subnet
* 100 GB ram disk exported by iser [1]
* Iser volume imported on client and the whole block device formatted ext4.
* FIO run on iser volume on the client [2]
* Anything not mentioned in this document should be default (it is a
pretty simple config)

Infiniband specific config:
====
* Any IB cards should work (my config has ConnectX-3, but has also
been seen on Connect-IB in our environment)
* Back to back (my config) or connected to a switch
* OpenSM running on the target (my config), or on a separate host (not
sure how cutting power to the switch may impact triggering the bug, I
believe it will still trigger ok)
* While running the fio job, pull the cable on the initiator side.
After about 120 seconds the fio job will fail and the iscsi processes
should be in D state on the target.

RoCE specific config:
====
* Only tested with ConnectX-4-LX cards (I don't know if others will
trigger the problem, pulling the cable like in the Infiniband section,
may also trigger the bug if it doesn't trigger automatically)
* Hosts must be connected by a switch or a Linux bridge that doesn't
have RoCE offload. I was able to trigger the bugs with a back to back
connection if the target clamps the speed to 10 Gb [3].
* Running the fio job should be enough to trigger the RoCE card to
unexpectedly drop the RDMA connection and that should then cause the
target iscsci processes to go into D state.

For either the Infiniband or RoCE setup, the bug can be triggered with
only two hosts connected back to back. If something is still not
clear, please let me know.

[1] /etc/saveconfig.json
```json
{
  "fabric_modules": [],
  "storage_objects": [
    {
      "attributes": {
        "block_size": 512,
        "emulate_3pc": 1,
        "emulate_caw": 1,
        "emulate_dpo": 0,
        "emulate_fua_read": 0,
        "emulate_fua_write": 1,
        "emulate_model_alias": 1,
        "emulate_rest_reord": 0,
        "emulate_tas": 1,
        "emulate_tpu": 0,
        "emulate_tpws": 0,
        "emulate_ua_intlck_ctrl": 0,
        "emulate_write_cache": 0,
        "enforce_pr_isids": 1,
        "force_pr_aptpl": 0,
        "is_nonrot": 1,
        "max_unmap_block_desc_count": 0,
        "max_unmap_lba_count": 0,
        "max_write_same_len": 0,
        "optimal_sectors": 4294967288,
        "pi_prot_format": 0,
        "pi_prot_type": 0,
        "queue_depth": 128,
        "unmap_granularity": 0,
        "unmap_granularity_alignment": 0
      },
      "name": "test1",
      "plugin": "ramdisk",
      "size": 107374182400,
      "wwn": "7486ed41-585e-400f-8799-ac605485b221"
    }
  ],
  "targets": [
    {
      "fabric": "iscsi",
      "tpgs": [
        {
          "attributes": {
            "authentication": 0,
            "cache_dynamic_acls": 1,
            "default_cmdsn_depth": 64,
            "default_erl": 0,
            "demo_mode_discovery": 1,
            "demo_mode_write_protect": 0,
            "generate_node_acls": 1,
            "login_timeout": 15,
            "netif_timeout": 2,
            "prod_mode_write_protect": 0,
            "t10_pi": 0
          },
          "enable": true,
          "luns": [
            {
              "index": 0,
              "storage_object": "/backstores/ramdisk/test1"
            }
          ],
          "node_acls": [],
          "parameters": {
            "AuthMethod": "CHAP,None",
            "DataDigest": "CRC32C,None",
            "DataPDUInOrder": "Yes",
            "DataSequenceInOrder": "Yes",
            "DefaultTime2Retain": "20",
            "DefaultTime2Wait": "2",
            "ErrorRecoveryLevel": "0",
            "FirstBurstLength": "65536",
            "HeaderDigest": "CRC32C,None",
            "IFMarkInt": "Reject",
            "IFMarker": "No",
            "ImmediateData": "Yes",
            "InitialR2T": "Yes",
            "MaxBurstLength": "262144",
            "MaxConnections": "1",
            "MaxOutstandingR2T": "1",
            "MaxRecvDataSegmentLength": "8192",
            "MaxXmitDataSegmentLength": "262144",
            "OFMarkInt": "Reject",
            "OFMarker": "No",
            "TargetAlias": "LIO Target"
          },
          "portals": [
            {
              "ip_address": "0.0.0.0",
              "iser": true,
              "port": 3260
            }
          ],
          "tag": 1
        }
      ],
      "wwn": "iqn.2016-12.com.betterservers"
    }
  ]
}
```
[2] echo "3" > /proc/sys/vm/drop_caches; fio --rw=read --bs=4K
--size=1G --numjobs=40 --name=worker.matt --group_reporting
[3] ethtool -s eth3 speed 10000 advertise 0x80000
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html