Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion

Mike Christie <michaelc@xxxxxxxxxxx> · Fri, 09 Jan 2015 14:19:11 -0600

On 01/09/2015 12:28 PM, Hannes Reinecke wrote:
> On 01/09/2015 07:00 PM, Michael Christie wrote:
>>
>> On Jan 8, 2015, at 11:03 PM, Nicholas A. Bellinger <nab@xxxxxxxxxxxxxxx> wrote:
>>
>>> On Thu, 2015-01-08 at 15:22 -0800, James Bottomley wrote:
>>>> On Thu, 2015-01-08 at 14:57 -0800, Nicholas A. Bellinger wrote:
>>>>> On Thu, 2015-01-08 at 14:29 -0800, James Bottomley wrote:
>>>>>> On Thu, 2015-01-08 at 14:16 -0800, Nicholas A. Bellinger wrote:
>>>
>>> <SNIP>
>>>
>>>>> The point is that a simple session wide counter for command sequence
>>>>> number assignment is significantly less overhead than all of the
>>>>> overhead associated with running a full multipath stack atop multiple
>>>>> sessions.
>>>>
>>>> I don't see how that's relevant to issue speed, which was the measure we
>>>> were using: The layers above are just a hopper.  As long as they're
>>>> loaded, the MQ lower layer can issue at full speed.  So as long as the
>>>> multipath hopper is efficient enough to keep the queues loaded there's
>>>> no speed degradation.
>>>>
>>>> The problem with a sequence point inside the MQ issue layer is that it
>>>> can cause a stall that reduces the issue speed. so the counter sequence
>>>> point causes a degraded issue speed over the multipath hopper approach
>>>> above even if the multipath approach has a higher CPU overhead.
>>>>
>>>> Now, if the system is close to 100% cpu already, *then* the multipath
>>>> overhead will try to take CPU power we don't have and cause a stall, but
>>>> it's only in the flat out CPU case.
>>>>
>>>>> Not to mention that our iSCSI/iSER initiator is already taking a session
>>>>> wide lock when sending outgoing PDUs, so adding a session wide counter
>>>>> isn't adding any additional synchronization overhead vs. what's already
>>>>> in place.
>>>>
>>>> I'll leave it up to the iSER people to decide whether they're redoing
>>>> this as part of the MQ work.
>>>>
>>>
>>> Session wide command sequence number synchronization isn't something to
>>> be removed as part of the MQ work.  It's a iSCSI/iSER protocol
>>> requirement.
>>>
>>> That is, the expected + maximum sequence numbers are returned as part of
>>> every response PDU, which the initiator uses to determine when the
>>> command sequence number window is open so new non-immediate commands may
>>> be sent to the target.
>>>
>>> So, given some manner of session wide synchronization is required
>>> between different contexts for the existing single connection case to
>>> update the command sequence number and check when the window opens, it's
>>> a fallacy to claim MC/S adds some type of new initiator specific
>>> synchronization overhead vs. single connection code.
>>
>> I think you are assuming we are leaving the iscsi code as it is today.
>>
>> For the non-MCS mq session per CPU design, we would be allocating and
>> binding the session and its resources to specific CPUs. They would only
>> be accessed by the threads on that one CPU, so we get our
>> serialization/synchronization from that. That is why we are saying we
>> do not need something like atomic_t/spin_locks for the sequence number
>> handling for this type of implementation.
>>
> Wouldn't that need to be coordinated with the networking layer?

Yes.

> Doesn't it do the same thing, matching TX/RX queues to CPUs?

Yes.

> If so, wouldn't we decrease bandwidth by restricting things to one CPU?

We have a session or connection per CPU though, so we end up hitting the
same problem you talked about last year where one hctx (iscsi session or
connection's socket or nic hw queue) could get overloaded. This is what
I meant in my original mail where iscsi would rely on whatever blk/mq
load balancers we end up implementing at that layer to balance requests
across hctxs.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html