Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Thu, 08 Jan 2015 14:16:01 -0800

On Thu, 2015-01-08 at 08:50 +0100, Bart Van Assche wrote:
> On 01/07/15 22:39, Mike Christie wrote:
> > On 01/07/2015 10:57 AM, Hannes Reinecke wrote:
> >> On 01/07/2015 05:25 PM, Sagi Grimberg wrote:
> >>> Hi everyone,
> >>>
> >>> Now that scsi-mq is fully included, we need an iSCSI initiator that
> >>> would use it to achieve scalable performance. The need is even greater
> >>> for iSCSI offload devices and transports that support multiple HW
> >>> queues. As iSER maintainer I'd like to discuss the way we would choose
> >>> to implement that in iSCSI.
> >>>
> >>> My measurements show that iSER initiator can scale up to ~2.1M IOPs
> >>> with multiple sessions but only ~630K IOPs with a single session where
> >>> the most significant bottleneck the (single) core processing
> >>> completions.
> >>>
> >>> In the existing single connection per session model, given that command
> >>> ordering must be preserved session-wide, we end up in a serial command
> >>> execution over a single connection which is basically a single queue
> >>> model. The best fit seems to be plugging iSCSI MCS as a multi-queued
> >>> scsi LLDD. In this model, a hardware context will have a 1x1 mapping
> >>> with an iSCSI connection (TCP socket or a HW queue).
> >>>
> >>> iSCSI MCS and it's role in the presence of dm-multipath layer was
> >>> discussed several times in the past decade(s). The basic need for MCS is
> >>> implementing a multi-queue data path, so perhaps we may want to avoid
> >>> doing any type link aggregation or load balancing to not overlap
> >>> dm-multipath. For example we can implement ERL=0 (which is basically the
> >>> scsi-mq ERL) and/or restrict a session to a single portal.
> >>>
> >>> As I see it, the todo's are:
> >>> 1. Getting MCS to work (kernel + user-space) with ERL=0 and a
> >>>     round-robin connection selection (per scsi command execution).
> >>> 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and
> >>>     using blk-mq based queue (conn) selection.
> >>> 3. Rework iSCSI core locking scheme to avoid session-wide locking
> >>>     as much as possible.
> >>> 4. Use blk-mq pre-allocation and tagging facilities.
> >>>
> >>> I've recently started looking into this. I would like the community to
> >>> agree (or debate) on this scheme and also talk about implementation
> >>> with anyone who is also interested in this.
> >>>
> >> Yes, that's a really good topic.
> >>
> >> I've pondered implementing MC/S for iscsi/TCP but then I've figured my
> >> network implementation knowledge doesn't spread that far.
> >> So yeah, a discussion here would be good.
> >>
> >> Mike? Any comments?
> >
> > I have been working under the assumption that people would be ok with
> > MCS upstream if we are only using it to handle the issue where we want
> > to do something like have a tcp/iscsi connection per CPU then map the
> > connection to a blk_mq_hw_ctx. In this more limited MCS implementation
> > there would be no iscsi layer code to do something like load balance
> > across ports or transport paths like how dm-multipath does, so there
> > would be no feature/code duplication. For balancing across hctxs, then
> > the iscsi layer would also leave that up to whatever we end up with in
> > upper layers, so again no feature/code duplication with upper layers.
> >
> > So pretty non controversial I hope :)
> >
> > If people want to add something like round robin connection selection in
> > the iscsi layer, then I think we want to leave that for after the
> > initial merge, so people can argue about that separately.
> 
> Hello Sagi and Mike,
> 
> I agree with Sagi that adding scsi-mq support in the iSER initiator 
> would help iSER users because that would allow these users to configure 
> a single iSER target and use the multiqueue feature instead of having to 
> configure multiple iSER targets to spread the workload over multiple 
> cpus at the target side.
> 
> And I agree with Mike that implementing scsi-mq support in the iSER 
> initiator as multiple independent connections probably is a better 
> choice than MC/S. RFC 3720 namely requires that iSCSI numbering is 
> session-wide. This means maintaining a single counter for all MC/S 
> sessions. Such a counter would be a contention point. I'm afraid that 
> because of that counter performance on a multi-socket initiator system 
> with a scsi-mq implementation based on MC/S could be worse than with the 
> approach with multiple iSER targets. Hence my preference for an approach 
> based on multiple independent iSER connections instead of MC/S.
> 

The idea that a simple session wide counter for command sequence number
assignment adds such a degree of contention that it renders MC/S at a
performance disadvantage vs. multi-session configurations with all of
the extra multipath logic overhead on top is at best, a naive
proposition.

On the initiator side for MC/S, literally the only thing that needs to
be serialized is the assignment of the command sequence number to
individual non-immediate PDUs.  The sending of the outgoing PDUs +
immediate data by the initiator can happen out-of-order, and it's up to
the target to ensure that the submission of the commands to the device
server is in command sequence number order.

All of the actual immediate data + R2T -> data-out processing by the
target can also be done out-of-order as well.

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html