On Tue, 2015-04-14 at 17:30 -0700, Andy Grover wrote: > The initial version of TCMU (in 3.18) does not properly handle > bidirectional SCSI commands -- those with both an in and out buffer. In > looking to fix this it also became clear that TCMU's support for adding > new types of entries (opcodes) to the command ring was broken. We need > to fix this now, so that future issues can be handled properly by adding > new opcodes. > > We make the most of this ABI break by enabling bidi cmd handling within > TCMP_OP_CMD opcode. Add an iov_bidi_cnt field to tcmu_cmd_entry.req. > This enables TCMU to describe bidi commands, but further kernel work is > needed for full bidi support. > > Enlarge tcmu_cmd_entry_hdr by 32 bits by pulling in cmd_id and __pad1. Turn > __pad1 into two 8 bit flags fields, for kernel-set and userspace-set flags, > "kflags" and "uflags" respectively. > > Update version fields so userspace can tell the interface is changed. > > Update tcmu-design.txt with details of how new stuff works: > - Specify an additional requirement for userspace to set UNKNOWN_OP > (bit 0) in hdr.uflags for unknown/unhandled opcodes. > - Define how Data-In and Data-Out fields are described in req.iov[] > > Changed in v2: > - Change name of SKIPPED bit to UNKNOWN bit > - PAD op does not set the bit any more > - Change len_op helper functions to take just len_op, not the whole struct > - Change version to 2 in missed spots, and use defines > - Add 16 unused bytes to cmd_entry.req, in case additional SAM cmd > parameters need to be included > - Add iov_dif_cnt field to specify buffers used for DIF info in iov[] > - Rearrange fields to naturally align cdb_off > - Handle if userspace sets UNKNOWN_OP by indicating failure of the cmd > - Wrap some overly long UPDATE_HEAD lines > > Signed-off-by: Andy Grover <agrover@xxxxxxxxxx> > --- > Documentation/target/tcmu-design.txt | 43 ++++++++++++++++++++++------------ > drivers/target/target_core_user.c | 44 +++++++++++++++++++++++++---------- > include/uapi/linux/target_core_user.h | 44 +++++++++++++++++++++-------------- > 3 files changed, 87 insertions(+), 44 deletions(-) > > diff --git a/Documentation/target/tcmu-design.txt b/Documentation/target/tcmu-design.txt > index 5518465..43e94ea 100644 > --- a/Documentation/target/tcmu-design.txt > +++ b/Documentation/target/tcmu-design.txt > @@ -138,27 +138,40 @@ signals the kernel via a 4-byte write(). When cmd_head equals > cmd_tail, the ring is empty -- no commands are currently waiting to be > processed by userspace. > > -TCMU commands start with a common header containing "len_op", a 32-bit > -value that stores the length, as well as the opcode in the lowest > -unused bits. Currently only two opcodes are defined, TCMU_OP_PAD and > -TCMU_OP_CMD. When userspace encounters a command with PAD opcode, it > -should skip ahead by the bytes in "length". (The kernel inserts PAD > -entries to ensure each CMD entry fits contigously into the circular > -buffer.) > - > -When userspace handles a CMD, it finds the SCSI CDB (Command Data > -Block) via tcmu_cmd_entry.req.cdb_off. This is an offset from the > -start of the overall shared memory region, not the entry. The data > -in/out buffers are accessible via tht req.iov[] array. Note that > -each iov.iov_base is also an offset from the start of the region. > - > -TCMU currently does not support BIDI operations. > +TCMU commands are 8-byte aligned. They start with a common header > +containing "len_op", a 32-bit value that stores the length, as well as > +the opcode in the lowest unused bits. It also contains cmd_id and > +flags fields for setting by the kernel (kflags) and userspace > +(uflags). > + > +Currently only two opcodes are defined, TCMU_OP_CMD and TCMU_OP_PAD. > + > +When the opcode is CMD, the entry in the command ring is a struct > +tcmu_cmd_entry. Userspace finds the SCSI CDB (Command Data Block) via > +tcmu_cmd_entry.req.cdb_off. This is an offset from the start of the > +overall shared memory region, not the entry. The data in/out buffers > +are accessible via tht req.iov[] array. iov_cnt contains the number of > +entries in iov[] needed to describe either the Data-In or Data-Out > +buffers. For bidirectional commands, iov_cnt specifies how many iovec > +entries cover the Data-Out area, and iov_bidi_count specifies how many > +iovec entries immediately after that in iov[] cover the Data-In > +area. Just like other fields, iov.iov_base is an offset from the start > +of the region. > > When completing a command, userspace sets rsp.scsi_status, and > rsp.sense_buffer if necessary. Userspace then increments > mailbox.cmd_tail by entry.hdr.length (mod cmdr_size) and signals the > kernel via the UIO method, a 4-byte write to the file descriptor. > > +When the opcode is PAD, userspace only updates cmd_tail as above -- > +it's a no-op. (The kernel inserts PAD entries to ensure each CMD entry > +is contiguous within the command ring.) > + > +More opcodes may be added in the future. If userspace encounters an > +opcode it does not handle, it must set UNKNOWN_OP bit (bit 0) in > +hdr.uflags, update cmd_tail, and proceed with processing additional > +commands, if any. > + > The Data Area: > > This is shared-memory space after the command ring. The organization > diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c > index 1fbf304..8174897 100644 > --- a/drivers/target/target_core_user.c > +++ b/drivers/target/target_core_user.c > @@ -344,8 +344,11 @@ static int tcmu_queue_cmd_ring(struct tcmu_cmd *tcmu_cmd) > > entry = (void *) mb + CMDR_OFF + cmd_head; > tcmu_flush_dcache_range(entry, sizeof(*entry)); > - tcmu_hdr_set_op(&entry->hdr, TCMU_OP_PAD); > - tcmu_hdr_set_len(&entry->hdr, pad_size); > + tcmu_hdr_set_op(&entry->hdr.len_op, TCMU_OP_PAD); > + tcmu_hdr_set_len(&entry->hdr.len_op, pad_size); > + entry->hdr.cmd_id = 0; /* not used for PAD */ > + entry->hdr.kflags = 0; > + entry->hdr.uflags = 0; > > UPDATE_HEAD(mb->cmd_head, pad_size, udev->cmdr_size); > > @@ -355,9 +358,11 @@ static int tcmu_queue_cmd_ring(struct tcmu_cmd *tcmu_cmd) > > entry = (void *) mb + CMDR_OFF + cmd_head; > tcmu_flush_dcache_range(entry, sizeof(*entry)); > - tcmu_hdr_set_op(&entry->hdr, TCMU_OP_CMD); > - tcmu_hdr_set_len(&entry->hdr, command_size); > - entry->cmd_id = tcmu_cmd->cmd_id; > + tcmu_hdr_set_op(&entry->hdr.len_op, TCMU_OP_CMD); > + tcmu_hdr_set_len(&entry->hdr.len_op, command_size); > + entry->hdr.cmd_id = tcmu_cmd->cmd_id; > + entry->hdr.kflags = 0; > + entry->hdr.uflags = 0; > > /* > * Fix up iovecs, and handle if allocation in data ring wrapped. > @@ -464,6 +469,17 @@ static void tcmu_handle_completion(struct tcmu_cmd *cmd, struct tcmu_cmd_entry * > return; > } > > + if (entry->hdr.uflags & TCMU_UFLAG_UNKNOWN_OP) { > + UPDATE_HEAD(udev->data_tail, cmd->data_length, udev->data_size); > + pr_warn("TCMU: Userspace set UNKNOWN_OP flag on se_cmd %p\n", > + cmd->se_cmd); > + transport_generic_request_failure(cmd->se_cmd, > + TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE); > + cmd->se_cmd = NULL; > + kmem_cache_free(tcmu_cmd_cache, cmd); > + return; > + } > + > if (entry->rsp.scsi_status == SAM_STAT_CHECK_CONDITION) { > memcpy(se_cmd->sense_buffer, entry->rsp.sense_buffer, > se_cmd->scsi_sense_length); > @@ -542,14 +558,16 @@ static unsigned int tcmu_handle_completions(struct tcmu_dev *udev) > > tcmu_flush_dcache_range(entry, sizeof(*entry)); > > - if (tcmu_hdr_get_op(&entry->hdr) == TCMU_OP_PAD) { > - UPDATE_HEAD(udev->cmdr_last_cleaned, tcmu_hdr_get_len(&entry->hdr), udev->cmdr_size); > + if (tcmu_hdr_get_op(entry->hdr.len_op) == TCMU_OP_PAD) { > + UPDATE_HEAD(udev->cmdr_last_cleaned, > + tcmu_hdr_get_len(entry->hdr.len_op), > + udev->cmdr_size); > continue; > } > - WARN_ON(tcmu_hdr_get_op(&entry->hdr) != TCMU_OP_CMD); > + WARN_ON(tcmu_hdr_get_op(entry->hdr.len_op) != TCMU_OP_CMD); > > spin_lock(&udev->commands_lock); > - cmd = idr_find(&udev->commands, entry->cmd_id); > + cmd = idr_find(&udev->commands, entry->hdr.cmd_id); > if (cmd) > idr_remove(&udev->commands, cmd->cmd_id); > spin_unlock(&udev->commands_lock); > @@ -562,7 +580,9 @@ static unsigned int tcmu_handle_completions(struct tcmu_dev *udev) > > tcmu_handle_completion(cmd, entry); > > - UPDATE_HEAD(udev->cmdr_last_cleaned, tcmu_hdr_get_len(&entry->hdr), udev->cmdr_size); > + UPDATE_HEAD(udev->cmdr_last_cleaned, > + tcmu_hdr_get_len(entry->hdr.len_op), > + udev->cmdr_size); > > handled++; > } > @@ -840,14 +860,14 @@ static int tcmu_configure_device(struct se_device *dev) > udev->data_size = TCMU_RING_SIZE - CMDR_SIZE; > > mb = udev->mb_addr; > - mb->version = 1; > + mb->version = TCMU_MAILBOX_VERSION; > mb->cmdr_off = CMDR_OFF; > mb->cmdr_size = udev->cmdr_size; > > WARN_ON(!PAGE_ALIGNED(udev->data_off)); > WARN_ON(udev->data_size % PAGE_SIZE); > > - info->version = "1"; > + info->version = xstr(TCMU_MAILBOX_VERSION); > > info->mem[0].name = "tcm-user command & data buffer"; > info->mem[0].addr = (phys_addr_t) udev->mb_addr; > diff --git a/include/uapi/linux/target_core_user.h b/include/uapi/linux/target_core_user.h > index b483d19..b67f99d 100644 > --- a/include/uapi/linux/target_core_user.h > +++ b/include/uapi/linux/target_core_user.h > @@ -6,7 +6,7 @@ > #include <linux/types.h> > #include <linux/uio.h> > > -#define TCMU_VERSION "1.0" > +#define TCMU_VERSION "2.0" > > /* > * Ring Design > @@ -39,9 +39,13 @@ > * should process the next packet the same way, and so on. > */ > > -#define TCMU_MAILBOX_VERSION 1 > +#define TCMU_MAILBOX_VERSION 2 > #define ALIGN_SIZE 64 /* Should be enough for most CPUs */ > > +/* See https://gcc.gnu.org/onlinedocs/cpp/Stringification.html */ > +#define xstr(s) str(s) > +#define str(s) #s > + > struct tcmu_mailbox { > __u16 version; > __u16 flags; > @@ -64,31 +68,36 @@ enum tcmu_opcode { > * Only a few opcodes, and length is 8-byte aligned, so use low bits for opcode. > */ > struct tcmu_cmd_entry_hdr { > - __u32 len_op; > + __u32 len_op; > + __u16 cmd_id; > + __u8 kflags; > +#define TCMU_UFLAG_UNKNOWN_OP 0x1 > + __u8 uflags; > + > } __packed; > > #define TCMU_OP_MASK 0x7 > > -static inline enum tcmu_opcode tcmu_hdr_get_op(struct tcmu_cmd_entry_hdr *hdr) > +static inline enum tcmu_opcode tcmu_hdr_get_op(__u32 len_op) > { > - return hdr->len_op & TCMU_OP_MASK; > + return len_op & TCMU_OP_MASK; > } > > -static inline void tcmu_hdr_set_op(struct tcmu_cmd_entry_hdr *hdr, enum tcmu_opcode op) > +static inline void tcmu_hdr_set_op(__u32 *len_op, enum tcmu_opcode op) > { > - hdr->len_op &= ~TCMU_OP_MASK; > - hdr->len_op |= (op & TCMU_OP_MASK); > + *len_op &= ~TCMU_OP_MASK; > + *len_op |= (op & TCMU_OP_MASK); > } > > -static inline __u32 tcmu_hdr_get_len(struct tcmu_cmd_entry_hdr *hdr) > +static inline __u32 tcmu_hdr_get_len(__u32 len_op) > { > - return hdr->len_op & ~TCMU_OP_MASK; > + return len_op & ~TCMU_OP_MASK; > } > > -static inline void tcmu_hdr_set_len(struct tcmu_cmd_entry_hdr *hdr, __u32 len) > +static inline void tcmu_hdr_set_len(__u32 *len_op, __u32 len) > { > - hdr->len_op &= TCMU_OP_MASK; > - hdr->len_op |= len; > + *len_op &= TCMU_OP_MASK; > + *len_op |= len; > } > > /* Currently the same as SCSI_SENSE_BUFFERSIZE */ > @@ -97,13 +106,14 @@ static inline void tcmu_hdr_set_len(struct tcmu_cmd_entry_hdr *hdr, __u32 len) > struct tcmu_cmd_entry { > struct tcmu_cmd_entry_hdr hdr; > > - uint16_t cmd_id; > - uint16_t __pad1; > - > union { > struct { > + uint32_t iov_cnt; > + uint32_t iov_bidi_cnt; > + uint32_t iov_dif_cnt; > uint64_t cdb_off; > - uint64_t iov_cnt; > + uint64_t __pad1; > + uint64_t __pad2; > struct iovec iov[0]; > } req; > struct { (Adding MKP + MST + Paolo CC') So this doesn't quite follow what virtio-scsi request header is defining for allowing both pi_bytesout + pi_bytesin within a single op. But then again, virtio-scsi doesn't support BIDI commands either.. But if v2 TCMU ABI is support BIDI commands, then we'll also want to allow individual request headers that contain T10-PI metadata for both out/in payloads, yes..? --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html