Re: FC target Errors

Arun Easi <arun.easi@xxxxxxxxxx> · Tue, 28 Jul 2015 14:24:44 -0700

Hi Nic,

Thanks for the response, my comments below..

On Sat, 25 Jul 2015, 2:02am -0700, Nicholas A. Bellinger wrote:

Hello Arun, Roland & Co,

Thanks for your feedback.  Apologies for delayed response on this bug.

On Wed, 2015-07-08 at 13:25 -0700, Arun Easi wrote:
Hi Nick,

On Mon, 6 Jul 2015, 9:08pm -0700, Nicholas A. Bellinger wrote:

(Adding Giri + Andrew CC')

On Wed, 2015-07-01 at 15:50 -0700, Roland Dreier wrote:
On Wed, Jul 1, 2015 at 3:41 PM, Craig Watson
<craig.watson@xxxxxxxxxxxxxxxxxxx> wrote:
So, is it really a limitation in the hardware? .. or is it in the qla2xxx
driver?  Does the qla2xxx driver not support chained scatter/gather lists?

Since the removal of fabric_max_sectors et.al. how is this size limit
communicated to an initiator?  From what I understood, the target front end
is going to reflect the limit of the back end storage.  Right now what I
(and our customers) see is just a refusal to operate at large (> 4.9MB)
transfer sizes.  At some write sizes there aren't even any error indications
on the target.  The target just stops responding.  This isn't a good thing.
It's better than crashing but not by much.  Other sizes do seem to crash the
kernel which really isn't a good thing.

The hardware really has the limitation that a single CTIO ("continue
target IO") control block can only have 1200-something gather-scatter
entries (which matches well with your 4.9 MB limit, given that each
entry will typically be a 4KB page).  However the fact that that
limits the size of a SCSI command is of course in driver code.  LIO
can use multiple CTIOs to respond to a single SCSI command.  It
doesn't at the moment though.

There is no way this is communicated to the initiator that I know of
at the moment.  I did not understand the "remove IO size limit"
commits when they went in, but they definitely leave the target stack
in a broken state WRT this type of issue.


Ok, so qla2xxx really does need some manner of max_sectors enforced by
target-core due to it's hw sgl limits, but I think the original way of
using a backend attribute hard-coded across all LUNs was wrong.

I agree with the general idea here (at least short term).


How about a fabric driver provided field like the following instead..?

diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
index 33364b6..ce4e84f 100644
--- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c
+++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
@@ -1814,6 +1814,7 @@ static const struct target_core_fabric_ops tcm_qla2xxx_ops = {
	.module				= THIS_MODULE,
	.name				= "qla2xxx",
	.node_acl_size			= sizeof(struct tcm_qla2xxx_nacl),
+	.fabric_hw_max_sectors		= 8192,

As the limit is in number of SG entries, my preference would be to just
say that, instead of in sectors terms, something like:
 	.max_sg_entries			= XX,

Stack could internally convert that to max-sectors using page-size and
sector-size calculation and report in EVPD-b0 and use for enforcement.

Longer term (Roland alluded to this, I think), it would be best to have
stack call .queue_data_in/.write_pending with a max of "max_sg_entries" SG
entries at one time and repeat until the whole data is transferred; thus
achieving a possible unlimited data transfer length.


Here's a better work-around following your earlier comments to enforce
target-core level se_cmd->t_data_sg fabric driver limits, currently
assuming 1:1 PAGE_SIZE mapping per internally allocated scatterlist
entry.

The max_data_sg_nents=1200 default for tcm_qla2xxx is following Roland's
earlier comments.

Please review.

From 272f87488fc8681eedb39937cba0b39fb59c0bba Mon Sep 17 00:00:00 2001
From: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
Date: Mon, 6 Jul 2015 20:47:55 -0700
Subject: [PATCH] target: Enforce fabric max_data_sg_nents I/O payload limit

This patch changes target-core to enforce optional fabric driver
provided max_data_sg_nents limit, using target_alloc_sgl() level
checking against existing single PAGE_SIZE per scatterlist entry.

If se_cmd payload length meets or exceeds the provided value,
this patch returns SAM_STAT_CHECK_CONDITION + ILLEGAL_REQUEST
xception status.

This is currently required to restrict scatter-gather-list entries
in per se_cmd->t_data_sg descriptor payload for tcm_qla2xxx.

It also updates associated target_alloc_sgl() callers, and
transport_generic_new_cmd() to handle proper sense_reason_t for
TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE + TCM_SECTOR_COUNT_TOO_MANY.

Reported-by: Craig Watson <craig.watson@xxxxxxxxxxxxxxxxxxx>
Signed-off-by: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
---
drivers/scsi/qla2xxx/tcm_qla2xxx.c     |  5 ++++
drivers/target/target_core_spc.c       | 13 +++++++---
drivers/target/target_core_transport.c | 45 ++++++++++++++++++++--------------
drivers/target/target_core_xcopy.c     |  2 +-
include/target/target_core_backend.h   |  2 +-
include/target/target_core_fabric.h    |  9 +++++++
6 files changed, 53 insertions(+), 23 deletions(-)

diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
index 9224a06..6649809 100644
--- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c
+++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
@@ -1804,6 +1804,11 @@ static const struct target_core_fabric_ops tcm_qla2xxx_ops = {
	.module				= THIS_MODULE,
	.name				= "qla2xxx",
	.node_acl_size			= sizeof(struct tcm_qla2xxx_nacl),
+	/*
+	 * XXX: Limit assumes single page per scatter-gather-list entry.
+	 * Current maximum is ~4.9 MB per se_cmd->t_data_sg with PAGE_SIZE=4096
+	 */
+	.max_data_sg_nents		= 1200,
	.get_fabric_name		= tcm_qla2xxx_get_fabric_name,
	.tpg_get_wwn			= tcm_qla2xxx_get_fabric_wwn,
	.tpg_get_tag			= tcm_qla2xxx_get_tag,
diff --git a/drivers/target/target_core_spc.c b/drivers/target/target_core_spc.c
index b5ba1ec..dd5249f 100644
--- a/drivers/target/target_core_spc.c
+++ b/drivers/target/target_core_spc.c
@@ -484,8 +484,8 @@ static sense_reason_t
spc_emulate_evpd_b0(struct se_cmd *cmd, unsigned char *buf)
{
	struct se_device *dev = cmd->se_dev;
-	int have_tp = 0;
-	int opt, min;
+	u32 mtl = 0;
+	int have_tp = 0, opt, min;

	/*
	 * Following spc3r22 section 6.5.3 Block Limits VPD page, when
@@ -516,8 +516,15 @@ spc_emulate_evpd_b0(struct se_cmd *cmd, unsigned char *buf)

	/*
	 * Set MAXIMUM TRANSFER LENGTH
+	 *
+	 * Currently assumes single PAGE_SIZE per scatterlist for fabrics
+	 * enforcing maximum HW scatter-gather-list entry limit
	 */
-	put_unaligned_be32(dev->dev_attrib.hw_max_sectors, &buf[8]);
+	if (cmd->se_tfo->max_data_sg_nents) {
+		mtl = (cmd->se_tfo->max_data_sg_nents * PAGE_SIZE) /
+		       dev->dev_attrib.block_size;
+	}
+	put_unaligned_be32(min_not_zero(mtl, dev->dev_attrib.hw_max_sectors), &buf[8]);

	/*
	 * Set OPTIMAL TRANSFER LENGTH
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index ce8574b..2c934fc 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2254,9 +2254,9 @@ void transport_kunmap_data_sg(struct se_cmd *cmd)
}
EXPORT_SYMBOL(transport_kunmap_data_sg);

-int
+sense_reason_t
target_alloc_sgl(struct scatterlist **sgl, unsigned int *nents, u32 length,
-		 bool zero_page)
+		 bool zero_page, u32 max_nents)
{
	struct scatterlist *sg;
	struct page *page;
@@ -2265,9 +2265,16 @@ target_alloc_sgl(struct scatterlist **sgl, unsigned int *nents, u32 length,
	int i = 0;

	nent = DIV_ROUND_UP(length, PAGE_SIZE);
+	if (max_nents && max_nents >= nent) {
+		pr_err_ratelimited("Required sg nents: %u exceeded fabric driver"
+				   " enforced max_nents: %u, failing command\n",
+				   nent, max_nents);
+		return TCM_SECTOR_COUNT_TOO_MANY;

How about setting a residual count of the difference and allowing IO to 
succeed? Thus, initiators could still make forward progress without 
worrying so much about the fabric limitation.

Rest of the changes look good.

Regards,
-Arun

+	}
+
	sg = kmalloc(sizeof(struct scatterlist) * nent, GFP_KERNEL);
	if (!sg)
-		return -ENOMEM;
+		return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;

	sg_init_table(sg, nent);

@@ -2283,7 +2290,7 @@ target_alloc_sgl(struct scatterlist **sgl, unsigned int *nents, u32 length,
	}
	*sgl = sg;
	*nents = nent;
-	return 0;
+	return TCM_NO_SENSE;

out:
	while (i > 0) {
@@ -2291,7 +2298,7 @@ out:
		__free_page(sg_page(&sg[i]));
	}
	kfree(sg);
-	return -ENOMEM;
+	return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
}

/*
@@ -2302,15 +2309,15 @@ out:
sense_reason_t
transport_generic_new_cmd(struct se_cmd *cmd)
{
-	int ret = 0;
+	sense_reason_t ret = TCM_NO_SENSE;
	bool zero_flag = !(cmd->se_cmd_flags & SCF_SCSI_DATA_CDB);

	if (cmd->prot_op != TARGET_PROT_NORMAL &&
	    !(cmd->se_cmd_flags & SCF_PASSTHROUGH_PROT_SG_TO_MEM_NOALLOC)) {
		ret = target_alloc_sgl(&cmd->t_prot_sg, &cmd->t_prot_nents,
-				       cmd->prot_length, true);
-		if (ret < 0)
-			return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
+				       cmd->prot_length, true, 0);
+		if (ret)
+			return ret;
	}

	/*
@@ -2333,15 +2340,17 @@ transport_generic_new_cmd(struct se_cmd *cmd)

			ret = target_alloc_sgl(&cmd->t_bidi_data_sg,
					       &cmd->t_bidi_data_nents,
-					       bidi_length, zero_flag);
-			if (ret < 0)
-				return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
+					       bidi_length, zero_flag,
+					       cmd->se_tfo->max_data_sg_nents);
+			if (ret)
+				return ret;
		}

		ret = target_alloc_sgl(&cmd->t_data_sg, &cmd->t_data_nents,
-				       cmd->data_length, zero_flag);
-		if (ret < 0)
-			return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
+				       cmd->data_length, zero_flag,
+				       cmd->se_tfo->max_data_sg_nents);
+		if (ret)
+			return ret;
	} else if ((cmd->se_cmd_flags & SCF_COMPARE_AND_WRITE) &&
		    cmd->data_length) {
		/*
@@ -2353,9 +2362,9 @@ transport_generic_new_cmd(struct se_cmd *cmd)

		ret = target_alloc_sgl(&cmd->t_bidi_data_sg,
				       &cmd->t_bidi_data_nents,
-				       caw_length, zero_flag);
-		if (ret < 0)
-			return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
+				       caw_length, zero_flag, 0);
+		if (ret)
+			return ret;
	}
	/*
	 * If this command is not a write we can execute it right here,
diff --git a/drivers/target/target_core_xcopy.c b/drivers/target/target_core_xcopy.c
index 4515f52..945b460 100644
--- a/drivers/target/target_core_xcopy.c
+++ b/drivers/target/target_core_xcopy.c
@@ -561,7 +561,7 @@ static int target_xcopy_setup_pt_cmd(

	if (alloc_mem) {
		rc = target_alloc_sgl(&cmd->t_data_sg, &cmd->t_data_nents,
-				      cmd->data_length, false);
+				      cmd->data_length, false, 0);
		if (rc < 0) {
			ret = rc;
			goto out;
diff --git a/include/target/target_core_backend.h b/include/target/target_core_backend.h
index 1e5c8f9..5f2e8e1 100644
--- a/include/target/target_core_backend.h
+++ b/include/target/target_core_backend.h
@@ -85,7 +85,7 @@ extern struct configfs_attribute *passthrough_attrib_attrs[];
void	*transport_kmap_data_sg(struct se_cmd *);
void	transport_kunmap_data_sg(struct se_cmd *);
/* core helpers also used by xcopy during internal command setup */
-int	target_alloc_sgl(struct scatterlist **, unsigned int *, u32, bool);
+sense_reason_t	target_alloc_sgl(struct scatterlist **, unsigned int *, u32, bool, u32);
sense_reason_t	transport_generic_map_mem_to_cmd(struct se_cmd *,
		struct scatterlist *, u32, struct scatterlist *, u32);

diff --git a/include/target/target_core_fabric.h b/include/target/target_core_fabric.h
index 18afef9..357dc96 100644
--- a/include/target/target_core_fabric.h
+++ b/include/target/target_core_fabric.h
@@ -5,6 +5,15 @@ struct target_core_fabric_ops {
	struct module *module;
	const char *name;
	size_t node_acl_size;
+	/*
+	 * Limits number of scatterlist entries per SCF_SCSI_DATA_CDB payload.
+	 * Setting this value tells target-core to enforce this limit, and
+	 * report as INQUIRY EVPD=b0 MAXIMUM TRANSFER LENGTH.
+	 *
+	 * XXX: Not all initiator hosts honor this block-limit EVPD.
+	 * Currently assumes single PAGE_SIZE per scatterlist entry
+	 */
+	u32 max_data_sg_nents;
	char *(*get_fabric_name)(void);
	char *(*tpg_get_wwn)(struct se_portal_group *);
	u16 (*tpg_get_tag)(struct se_portal_group *);

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html