Re: [RFC PATCHv2 1/2] target: Version 2 of TCMU ABI

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2015-04-14 at 17:30 -0700, Andy Grover wrote:
> The initial version of TCMU (in 3.18) does not properly handle
> bidirectional SCSI commands -- those with both an in and out buffer. In
> looking to fix this it also became clear that TCMU's support for adding
> new types of entries (opcodes) to the command ring was broken. We need
> to fix this now, so that future issues can be handled properly by adding
> new opcodes.
> 
> We make the most of this ABI break by enabling bidi cmd handling within
> TCMP_OP_CMD opcode. Add an iov_bidi_cnt field to tcmu_cmd_entry.req.
> This enables TCMU to describe bidi commands, but further kernel work is
> needed for full bidi support.
> 
> Enlarge tcmu_cmd_entry_hdr by 32 bits by pulling in cmd_id and __pad1. Turn
> __pad1 into two 8 bit flags fields, for kernel-set and userspace-set flags,
> "kflags" and "uflags" respectively.
> 
> Update version fields so userspace can tell the interface is changed.
> 
> Update tcmu-design.txt with details of how new stuff works:
> - Specify an additional requirement for userspace to set UNKNOWN_OP
>   (bit 0) in hdr.uflags for unknown/unhandled opcodes.
> - Define how Data-In and Data-Out fields are described in req.iov[]
> 
> Changed in v2:
> - Change name of SKIPPED bit to UNKNOWN bit
> - PAD op does not set the bit any more
> - Change len_op helper functions to take just len_op, not the whole struct
> - Change version to 2 in missed spots, and use defines
> - Add 16 unused bytes to cmd_entry.req, in case additional SAM cmd
>   parameters need to be included
> - Add iov_dif_cnt field to specify buffers used for DIF info in iov[]
> - Rearrange fields to naturally align cdb_off
> - Handle if userspace sets UNKNOWN_OP by indicating failure of the cmd
> - Wrap some overly long UPDATE_HEAD lines
> 
> Signed-off-by: Andy Grover <agrover@xxxxxxxxxx>
> ---
>  Documentation/target/tcmu-design.txt  | 43 ++++++++++++++++++++++------------
>  drivers/target/target_core_user.c     | 44 +++++++++++++++++++++++++----------
>  include/uapi/linux/target_core_user.h | 44 +++++++++++++++++++++--------------
>  3 files changed, 87 insertions(+), 44 deletions(-)
> 
> diff --git a/Documentation/target/tcmu-design.txt b/Documentation/target/tcmu-design.txt
> index 5518465..43e94ea 100644
> --- a/Documentation/target/tcmu-design.txt
> +++ b/Documentation/target/tcmu-design.txt
> @@ -138,27 +138,40 @@ signals the kernel via a 4-byte write(). When cmd_head equals
>  cmd_tail, the ring is empty -- no commands are currently waiting to be
>  processed by userspace.
>  
> -TCMU commands start with a common header containing "len_op", a 32-bit
> -value that stores the length, as well as the opcode in the lowest
> -unused bits. Currently only two opcodes are defined, TCMU_OP_PAD and
> -TCMU_OP_CMD. When userspace encounters a command with PAD opcode, it
> -should skip ahead by the bytes in "length". (The kernel inserts PAD
> -entries to ensure each CMD entry fits contigously into the circular
> -buffer.)
> -
> -When userspace handles a CMD, it finds the SCSI CDB (Command Data
> -Block) via tcmu_cmd_entry.req.cdb_off. This is an offset from the
> -start of the overall shared memory region, not the entry. The data
> -in/out buffers are accessible via tht req.iov[] array. Note that
> -each iov.iov_base is also an offset from the start of the region.
> -
> -TCMU currently does not support BIDI operations.
> +TCMU commands are 8-byte aligned. They start with a common header
> +containing "len_op", a 32-bit value that stores the length, as well as
> +the opcode in the lowest unused bits. It also contains cmd_id and
> +flags fields for setting by the kernel (kflags) and userspace
> +(uflags).
> +
> +Currently only two opcodes are defined, TCMU_OP_CMD and TCMU_OP_PAD.
> +
> +When the opcode is CMD, the entry in the command ring is a struct
> +tcmu_cmd_entry. Userspace finds the SCSI CDB (Command Data Block) via
> +tcmu_cmd_entry.req.cdb_off. This is an offset from the start of the
> +overall shared memory region, not the entry. The data in/out buffers
> +are accessible via tht req.iov[] array. iov_cnt contains the number of
> +entries in iov[] needed to describe either the Data-In or Data-Out
> +buffers. For bidirectional commands, iov_cnt specifies how many iovec
> +entries cover the Data-Out area, and iov_bidi_count specifies how many
> +iovec entries immediately after that in iov[] cover the Data-In
> +area. Just like other fields, iov.iov_base is an offset from the start
> +of the region.
>  
>  When completing a command, userspace sets rsp.scsi_status, and
>  rsp.sense_buffer if necessary. Userspace then increments
>  mailbox.cmd_tail by entry.hdr.length (mod cmdr_size) and signals the
>  kernel via the UIO method, a 4-byte write to the file descriptor.
>  
> +When the opcode is PAD, userspace only updates cmd_tail as above --
> +it's a no-op. (The kernel inserts PAD entries to ensure each CMD entry
> +is contiguous within the command ring.)
> +
> +More opcodes may be added in the future. If userspace encounters an
> +opcode it does not handle, it must set UNKNOWN_OP bit (bit 0) in
> +hdr.uflags, update cmd_tail, and proceed with processing additional
> +commands, if any.
> +
>  The Data Area:
>  
>  This is shared-memory space after the command ring. The organization
> diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c
> index 1fbf304..8174897 100644
> --- a/drivers/target/target_core_user.c
> +++ b/drivers/target/target_core_user.c
> @@ -344,8 +344,11 @@ static int tcmu_queue_cmd_ring(struct tcmu_cmd *tcmu_cmd)
>  
>  		entry = (void *) mb + CMDR_OFF + cmd_head;
>  		tcmu_flush_dcache_range(entry, sizeof(*entry));
> -		tcmu_hdr_set_op(&entry->hdr, TCMU_OP_PAD);
> -		tcmu_hdr_set_len(&entry->hdr, pad_size);
> +		tcmu_hdr_set_op(&entry->hdr.len_op, TCMU_OP_PAD);
> +		tcmu_hdr_set_len(&entry->hdr.len_op, pad_size);
> +		entry->hdr.cmd_id = 0; /* not used for PAD */
> +		entry->hdr.kflags = 0;
> +		entry->hdr.uflags = 0;
>  
>  		UPDATE_HEAD(mb->cmd_head, pad_size, udev->cmdr_size);
>  
> @@ -355,9 +358,11 @@ static int tcmu_queue_cmd_ring(struct tcmu_cmd *tcmu_cmd)
>  
>  	entry = (void *) mb + CMDR_OFF + cmd_head;
>  	tcmu_flush_dcache_range(entry, sizeof(*entry));
> -	tcmu_hdr_set_op(&entry->hdr, TCMU_OP_CMD);
> -	tcmu_hdr_set_len(&entry->hdr, command_size);
> -	entry->cmd_id = tcmu_cmd->cmd_id;
> +	tcmu_hdr_set_op(&entry->hdr.len_op, TCMU_OP_CMD);
> +	tcmu_hdr_set_len(&entry->hdr.len_op, command_size);
> +	entry->hdr.cmd_id = tcmu_cmd->cmd_id;
> +	entry->hdr.kflags = 0;
> +	entry->hdr.uflags = 0;
>  
>  	/*
>  	 * Fix up iovecs, and handle if allocation in data ring wrapped.
> @@ -464,6 +469,17 @@ static void tcmu_handle_completion(struct tcmu_cmd *cmd, struct tcmu_cmd_entry *
>  		return;
>  	}
>  
> +	if (entry->hdr.uflags & TCMU_UFLAG_UNKNOWN_OP) {
> +		UPDATE_HEAD(udev->data_tail, cmd->data_length, udev->data_size);
> +		pr_warn("TCMU: Userspace set UNKNOWN_OP flag on se_cmd %p\n",
> +			cmd->se_cmd);
> +		transport_generic_request_failure(cmd->se_cmd,
> +			TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> +		cmd->se_cmd = NULL;
> +		kmem_cache_free(tcmu_cmd_cache, cmd);
> +		return;
> +	}
> +
>  	if (entry->rsp.scsi_status == SAM_STAT_CHECK_CONDITION) {
>  		memcpy(se_cmd->sense_buffer, entry->rsp.sense_buffer,
>  			       se_cmd->scsi_sense_length);
> @@ -542,14 +558,16 @@ static unsigned int tcmu_handle_completions(struct tcmu_dev *udev)
>  
>  		tcmu_flush_dcache_range(entry, sizeof(*entry));
>  
> -		if (tcmu_hdr_get_op(&entry->hdr) == TCMU_OP_PAD) {
> -			UPDATE_HEAD(udev->cmdr_last_cleaned, tcmu_hdr_get_len(&entry->hdr), udev->cmdr_size);
> +		if (tcmu_hdr_get_op(entry->hdr.len_op) == TCMU_OP_PAD) {
> +			UPDATE_HEAD(udev->cmdr_last_cleaned,
> +				    tcmu_hdr_get_len(entry->hdr.len_op),
> +				    udev->cmdr_size);
>  			continue;
>  		}
> -		WARN_ON(tcmu_hdr_get_op(&entry->hdr) != TCMU_OP_CMD);
> +		WARN_ON(tcmu_hdr_get_op(entry->hdr.len_op) != TCMU_OP_CMD);
>  
>  		spin_lock(&udev->commands_lock);
> -		cmd = idr_find(&udev->commands, entry->cmd_id);
> +		cmd = idr_find(&udev->commands, entry->hdr.cmd_id);
>  		if (cmd)
>  			idr_remove(&udev->commands, cmd->cmd_id);
>  		spin_unlock(&udev->commands_lock);
> @@ -562,7 +580,9 @@ static unsigned int tcmu_handle_completions(struct tcmu_dev *udev)
>  
>  		tcmu_handle_completion(cmd, entry);
>  
> -		UPDATE_HEAD(udev->cmdr_last_cleaned, tcmu_hdr_get_len(&entry->hdr), udev->cmdr_size);
> +		UPDATE_HEAD(udev->cmdr_last_cleaned,
> +			    tcmu_hdr_get_len(entry->hdr.len_op),
> +			    udev->cmdr_size);
>  
>  		handled++;
>  	}
> @@ -840,14 +860,14 @@ static int tcmu_configure_device(struct se_device *dev)
>  	udev->data_size = TCMU_RING_SIZE - CMDR_SIZE;
>  
>  	mb = udev->mb_addr;
> -	mb->version = 1;
> +	mb->version = TCMU_MAILBOX_VERSION;
>  	mb->cmdr_off = CMDR_OFF;
>  	mb->cmdr_size = udev->cmdr_size;
>  
>  	WARN_ON(!PAGE_ALIGNED(udev->data_off));
>  	WARN_ON(udev->data_size % PAGE_SIZE);
>  
> -	info->version = "1";
> +	info->version = xstr(TCMU_MAILBOX_VERSION);
>  
>  	info->mem[0].name = "tcm-user command & data buffer";
>  	info->mem[0].addr = (phys_addr_t) udev->mb_addr;
> diff --git a/include/uapi/linux/target_core_user.h b/include/uapi/linux/target_core_user.h
> index b483d19..b67f99d 100644
> --- a/include/uapi/linux/target_core_user.h
> +++ b/include/uapi/linux/target_core_user.h
> @@ -6,7 +6,7 @@
>  #include <linux/types.h>
>  #include <linux/uio.h>
>  
> -#define TCMU_VERSION "1.0"
> +#define TCMU_VERSION "2.0"
>  
>  /*
>   * Ring Design
> @@ -39,9 +39,13 @@
>   * should process the next packet the same way, and so on.
>   */
>  
> -#define TCMU_MAILBOX_VERSION 1
> +#define TCMU_MAILBOX_VERSION 2
>  #define ALIGN_SIZE 64 /* Should be enough for most CPUs */
>  
> +/* See https://gcc.gnu.org/onlinedocs/cpp/Stringification.html */
> +#define xstr(s) str(s)
> +#define str(s) #s
> +
>  struct tcmu_mailbox {
>  	__u16 version;
>  	__u16 flags;
> @@ -64,31 +68,36 @@ enum tcmu_opcode {
>   * Only a few opcodes, and length is 8-byte aligned, so use low bits for opcode.
>   */
>  struct tcmu_cmd_entry_hdr {
> -		__u32 len_op;
> +	__u32 len_op;
> +	__u16 cmd_id;
> +	__u8 kflags;
> +#define TCMU_UFLAG_UNKNOWN_OP 0x1
> +	__u8 uflags;
> +
>  } __packed;
>  
>  #define TCMU_OP_MASK 0x7
>  
> -static inline enum tcmu_opcode tcmu_hdr_get_op(struct tcmu_cmd_entry_hdr *hdr)
> +static inline enum tcmu_opcode tcmu_hdr_get_op(__u32 len_op)
>  {
> -	return hdr->len_op & TCMU_OP_MASK;
> +	return len_op & TCMU_OP_MASK;
>  }
>  
> -static inline void tcmu_hdr_set_op(struct tcmu_cmd_entry_hdr *hdr, enum tcmu_opcode op)
> +static inline void tcmu_hdr_set_op(__u32 *len_op, enum tcmu_opcode op)
>  {
> -	hdr->len_op &= ~TCMU_OP_MASK;
> -	hdr->len_op |= (op & TCMU_OP_MASK);
> +	*len_op &= ~TCMU_OP_MASK;
> +	*len_op |= (op & TCMU_OP_MASK);
>  }
>  
> -static inline __u32 tcmu_hdr_get_len(struct tcmu_cmd_entry_hdr *hdr)
> +static inline __u32 tcmu_hdr_get_len(__u32 len_op)
>  {
> -	return hdr->len_op & ~TCMU_OP_MASK;
> +	return len_op & ~TCMU_OP_MASK;
>  }
>  
> -static inline void tcmu_hdr_set_len(struct tcmu_cmd_entry_hdr *hdr, __u32 len)
> +static inline void tcmu_hdr_set_len(__u32 *len_op, __u32 len)
>  {
> -	hdr->len_op &= TCMU_OP_MASK;
> -	hdr->len_op |= len;
> +	*len_op &= TCMU_OP_MASK;
> +	*len_op |= len;
>  }
>  
>  /* Currently the same as SCSI_SENSE_BUFFERSIZE */
> @@ -97,13 +106,14 @@ static inline void tcmu_hdr_set_len(struct tcmu_cmd_entry_hdr *hdr, __u32 len)
>  struct tcmu_cmd_entry {
>  	struct tcmu_cmd_entry_hdr hdr;
>  
> -	uint16_t cmd_id;
> -	uint16_t __pad1;
> -
>  	union {
>  		struct {
> +			uint32_t iov_cnt;
> +			uint32_t iov_bidi_cnt;
> +			uint32_t iov_dif_cnt;
>  			uint64_t cdb_off;
> -			uint64_t iov_cnt;
> +			uint64_t __pad1;
> +			uint64_t __pad2;
>  			struct iovec iov[0];
>  		} req;
>  		struct {

(Adding MKP + MST + Paolo CC')

So this doesn't quite follow what virtio-scsi request header is defining
for allowing both pi_bytesout + pi_bytesin within a single op.  But then
again, virtio-scsi doesn't support BIDI commands either..

But if v2 TCMU ABI is support BIDI commands, then we'll also want to
allow individual request headers that contain T10-PI metadata for both
out/in payloads, yes..?

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux