Re: FC target Errors

Arun Easi <arun.easi@xxxxxxxxxx> · Fri, 10 Jul 2015 14:47:54 -0700

On Thu, 9 Jul 2015, 7:38pm -0700, Craig Watson wrote:

It is apparent that we need a check on the transfer size.  Crashing the
kernel is not a good thing.  It is apparent that the expected behaviour
of the back end isn't going to cut it due to some problems on certain
front end fabrics (fibre channel for instance).  So, what Nicholas is
proposing here seems to address the immediate issue of overrunning
things and it looks like an improvement over the older
fabric_max_sectors approach (across all LUNs) that was removed.  Basing
it on max sg entries seems like a good enhancement too.  Unfortunately,
the reasons for the removal of fabric_max_sectors remain.  As Nicholas
noted in commit046ba64, most initiators ignore the MAXIMUM TRANSFER
LENGTH limit published.  Also, many (most?) legacy devices expect a much
larger transfer size to be handled.  They don't check for a limit
either.  They just blast away and everything falls apart.

It is not just legacy stuff though.  The one thing that seems constant
in the data storage business is that data gets bigger. That brings us to
what Roland and Arun have brought up.  We are going to need a better
solution in the long run.  Expecting the hardware to handle this when
other platforms deal with it in their driver stack isn't going to
happen.  That's a matter of economics I've never seen broken in this
business.  When performance drives larger cache sizes, we might see a
change.

I have a current need for transfer sizes in the 30MB range. Fibre
channel systems (with QLogic HBAs) were doing this 10-15 years ago.
Imposing a limit of 4MB now is closing the barn door long after the
horses are out.  Incidentally, on legacy systems, large transfer sizes
can make a huge difference in system throughput.  I have no idea why
since, at the sizes involved (4MB vs 30MB) , the small amount of
protocol overhead added by breaking a transaction up into 7 or 10 pieces
seems trivial.  My customers have demonstrated this though.  Their
systems fail to operate within requirements when they reduce the
transfer size. Fibre Channel has been the high-end performance leader
for a long time and a lot of legacy systems pushed into Fibre Channel
due to performance requirements were often tweaked and optimized even then.

So, how do I make this happen?  Can someone (Nicholas? Roland? Arun?)
point me to the areas of code where this should be worked?

From a stack perspective, something like this is what I had in mind:
(BTW, I claim no expertise on the stack)

{ read case example }
1a target_submit_cmd
1b -> transport_handle_cdb_direct
1c -> transport_generic_new_cmd
1d -> target_alloc_sgl { modify this to allocate with constraints }
1e -> target_execute_cmd

2a target_complete_cmd
2b -> target_complete_ok_work
2c -> cmd->se_tfo->queue_data_in
      { With flags in cmd indicating only data, and a data offset updated. }
      { For the last part, indicate that status needs to be sent. }

3a <fabric-drv-sending-data>
3b <fabric-drv-when-done>
3c target_cmd_done_partial { New interface }

4a target_cmd_done_partial
4b -> free earlier allocated sgs { or re-use if possible }
4c -> continue from (1d)

There will be additional logic required, if T10 PI is used and it's SGLs 
are also considered.

From a qla2xxx-only perspective, we are internally discussing at some
alternatives to remove this limit. Himanshu (copied) will get back with a 
temporary patch, when it is ready, for you to try out.

Arun, is there a way to get hardware documentation?

This can be shared only under NDA. I will sent you a private email with a 
QLogic internal contact, whom you can work with to obtain the doc.

Regards,
-Arun


Craig Watson


On 07/08/2015 04:25 PM, Arun Easi wrote:
Hi Nick,

On Mon, 6 Jul 2015, 9:08pm -0700, Nicholas A. Bellinger wrote:

(Adding Giri + Andrew CC')

On Wed, 2015-07-01 at 15:50 -0700, Roland Dreier wrote:
On Wed, Jul 1, 2015 at 3:41 PM, Craig Watson
<craig.watson@xxxxxxxxxxxxxxxxxxx> wrote:
So, is it really a limitation in the hardware? .. or is it in the
qla2xxx
driver?  Does the qla2xxx driver not support chained scatter/gather
lists?

Since the removal of fabric_max_sectors et.al. how is this size limit
communicated to an initiator?  From what I understood, the target
front end
is going to reflect the limit of the back end storage. Right now
what I
(and our customers) see is just a refusal to operate at large (>
4.9MB)
transfer sizes.  At some write sizes there aren't even any error
indications
on the target.  The target just stops responding.  This isn't a
good thing.
It's better than crashing but not by much.  Other sizes do seem to
crash the
kernel which really isn't a good thing.

The hardware really has the limitation that a single CTIO ("continue
target IO") control block can only have 1200-something gather-scatter
entries (which matches well with your 4.9 MB limit, given that each
entry will typically be a 4KB page).  However the fact that that
limits the size of a SCSI command is of course in driver code.  LIO
can use multiple CTIOs to respond to a single SCSI command. It
doesn't at the moment though.

There is no way this is communicated to the initiator that I know of
at the moment.  I did not understand the "remove IO size limit"
commits when they went in, but they definitely leave the target stack
in a broken state WRT this type of issue.


Ok, so qla2xxx really does need some manner of max_sectors enforced by
target-core due to it's hw sgl limits, but I think the original way of
using a backend attribute hard-coded across all LUNs was wrong.

I agree with the general idea here (at least short term).


How about a fabric driver provided field like the following instead..?

diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c
b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
index 33364b6..ce4e84f 100644
--- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c
+++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
@@ -1814,6 +1814,7 @@ static const struct target_core_fabric_ops
tcm_qla2xxx_ops = {
    .module                = THIS_MODULE,
    .name                = "qla2xxx",
    .node_acl_size            = sizeof(struct tcm_qla2xxx_nacl),
+    .fabric_hw_max_sectors        = 8192,

As the limit is in number of SG entries, my preference would be to
just say that, instead of in sectors terms, something like:
    .max_sg_entries            = XX,

Stack could internally convert that to max-sectors using page-size and
sector-size calculation and report in EVPD-b0 and use for enforcement.

Longer term (Roland alluded to this, I think), it would be best to
have stack call .queue_data_in/.write_pending with a max of
"max_sg_entries" SG entries at one time and repeat until the whole
data is transferred; thus achieving a possible unlimited data transfer
length.

Regards,
-Arun

    .get_fabric_name        = tcm_qla2xxx_get_fabric_name,
    .tpg_get_wwn            = tcm_qla2xxx_get_fabric_wwn,
    .tpg_get_tag            = tcm_qla2xxx_get_tag,
diff --git a/drivers/target/target_core_sbc.c
b/drivers/target/target_core_sbc.c
index e318ddb..e730439 100644
--- a/drivers/target/target_core_sbc.c
+++ b/drivers/target/target_core_sbc.c
@@ -1081,6 +1081,15 @@ sbc_parse_cdb(struct se_cmd *cmd, struct
sbc_ops *ops)

    if (cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) {
        unsigned long long end_lba;
+
+        if (cmd->se_tfo->fabric_hw_max_sectors &&
+            sectors > cmd->se_tfo->fabric_hw_max_sectors) {
+            printk_ratelimited(KERN_ERR "SCSI OP %02xh with too"
+                " big sectors %u exceeds fabric_hw_max_sectors:"
+                " %d\n", cdb[0], sectors,
+                cmd->se_tfo->fabric_hw_max_sectors);
+            return TCM_INVALID_CDB_FIELD;
+        }
check_lba:
        end_lba = dev->transport->get_blocks(dev) + 1;
        if (((cmd->t_task_lba + sectors) < cmd->t_task_lba) ||
diff --git a/drivers/target/target_core_spc.c
b/drivers/target/target_core_spc.c
index b5ba1ec..5675dc6 100644
--- a/drivers/target/target_core_spc.c
+++ b/drivers/target/target_core_spc.c
@@ -517,7 +517,8 @@ spc_emulate_evpd_b0(struct se_cmd *cmd, unsigned
char *buf)
    /*
     * Set MAXIMUM TRANSFER LENGTH
     */
-    put_unaligned_be32(dev->dev_attrib.hw_max_sectors, &buf[8]);
+ put_unaligned_be32(min_not_zero(cmd->se_tfo->fabric_hw_max_sectors,
+               dev->dev_attrib.hw_max_sectors), &buf[8]);

    /*
     * Set OPTIMAL TRANSFER LENGTH
diff --git a/include/target/target_core_fabric.h
b/include/target/target_core_fabric.h
index 18afef9..e47ef8b 100644
--- a/include/target/target_core_fabric.h
+++ b/include/target/target_core_fabric.h
@@ -5,6 +5,12 @@ struct target_core_fabric_ops {
    struct module *module;
    const char *name;
    size_t node_acl_size;
+    /*
+     * If the fabric is hardware limited by the number of sectors it
can
+     * handle in a single I/O request, notify target-core to enforce
this
+     * limit and report during EVPD=0xb0 MAXIMUM TRANSFER LENGTH.
+     */
+    u32 fabric_hw_max_sectors;
    char *(*get_fabric_name)(void);
    char *(*tpg_get_wwn)(struct se_portal_group *);
    u16 (*tpg_get_tag)(struct se_portal_group *);

--
To unsubscribe from this list: send the line "unsubscribe
target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html