Re: Ring buffer in TCMU, and queue depth/max sectors in Loopback device

Sheng Yang <sheng@xxxxxxxxxx> · Wed, 13 Jan 2016 14:25:07 -0800

On Wed, Jan 13, 2016 at 11:00 AM, Andy Grover <agrover@xxxxxxxxxx> wrote:
> On 01/12/2016 05:11 PM, Sheng Yang wrote:
>>
>> I am currently working on a project using TCMU in production, and got
>> an issue related to both TCMU and loopback device recently.
>>
>> After connect to TCMU successfully, I immediately notice that the cmd
>> ring and data ring was filled very quickly. I am using TCMU as backend
>> and loopback device as frontend.
>>
>> If I understand right, hw_max_sectors should be the maximum sectors(in
>> 512 or in block size? more likely in block size) one SCSI command can
>> use, and hw_queue_depth is the maximum number of commands that
>> hardware can queued at any given moment.
>
>
> This all is to support MAXIMUM TRANSFER LENGTH field in VPD page b0. The
> spec says it's supposed to be in logical blocks (naturally), but from
> looking at target_core_spc.c line 527 we are returning hw_max_sectors except
> for if qla2xxx. This is a bug. We should be dividing it by
> dev_attrib.block_size, which we can assume is in sectors-per-block.

Thanks for explaining.

>
>> Both parameter would affect
>> the size of ring buffer in TCMU, since I assume ring buffer should
>> able to hold all the commands/data sent from block layer.
>
>
> Well, maybe. Using queue_depth seems to assume there is some hardware
> resource that is used per request, but TCMU's data area is a resource that
> may only fit two giant requests, or may fit 1024 tiny requests. queue_depth
> is a rate-control method between the initiator and target. Ring full is also
> a rate-control method. Ring full seems preferable, because it  accurately
> stops processing when the resource (the tcmu ring buffer) actually is full.
> It also is within a single machine instead of across a fabric, so latency to
> resume processing will be less, I'd think.

What's bothered me is what would happen if the speed of block layer
sending command is faster than TCMU userspace can handle, e.g. due to
temporarily network interrupt or network downgrade. Then it may result
in command TIMEDOUT, thus result in upper layer error as well. Since
block layer would only send queue_depth mount a command, so they won't
try more command in the case of ring can hold all of them, and wait
for userspace to handle, which is more gracefully way of handling
things.

>
> So although I don't think it makes sense to size the ring buffer as
> queue_depth * max_request_size, we can certainly make the ring buffer
> larger. A larger buffer will allow more in-flight cmds and data but if the
> userspace handler is actually slower than the requests then it doesn't
> actually improve IOPs, you just get bufferbloat. The downside, at least
> currently, is wasting memory if the userspace handler *can* keep up with
> requests that arrive, plus the context switch.
>
> Proper sizing of the ring buffer, or potentially dynamically sizing it (the
> mmaped area is fixed-size but we need not actually back the entire area with
> pages) is something the initial TCMU implementation did not address, but
> something we may want to look at now, especially now that there is starting
> to be some real-world usage to base decisions on.

Sure, and I would like to check the possibility of zero-copy as well.
There are many things we could potentially improve TCMU.

>
>> My questions are:
>>
>> 1. Does hw_max_sectors and hw_queue_depth meant to be used to control
>> the max_sectors and queue_depth of the device? If so, how they would
>> work? And do we have a way to configure them in userspace if they
>> works? It seems they have to be verified not beyond the ring buffer
>> size.
>
>
> From what I can tell, hw_max_sectors is useful to set so hardware
> limitations are not exceeded. The only reason I could see TCMU needing it
> was to avoid one request larger than the entire ring buffer size.
>
> I see queue depth being used for sizing the iscsi cmdsn window, but don't
> see how it is relevant for TCMU.
>
>> 2. Or it's the loopback device we should really configure in
>> userspace? I can configure queue_depth in userspace through
>> */sys/bus/scsi/devices/<SCSI device>/queue_depth*, though I don't know
>> how to configure max_sectors yet. Probably try
>> "/sys/block/xxx/queue/max_sectors_kb", but I don't know if that's
>> expected.
>>
>> Please correct me if I got anything wrong. I really hope to get this
>> piece of code work flawlessly, and would be happy to contribute. But I
>> am not sure what's the correct way to fix it.
>
>
> Just to repeat what I said above, I think loopback seeing TCMU ring full is
> ok. It doesn't make sense to have a restrictive queue_depth because TCMU
> doesn't have a number-of-cmds limitation, it has a total-data-size-of-cmds
> limitation.
>
> Regards -- Andy
>

As I said, I am worrying about ring buffer result in kernel reports
hardware error, at least that can be prevented by use a buffer
adjusted according to queue_depth and max_sectors.

Thanks.

--Sheng
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html