Re: Ring buffer in TCMU, and queue depth/max sectors in Loopback device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/12/2016 05:11 PM, Sheng Yang wrote:
I am currently working on a project using TCMU in production, and got
an issue related to both TCMU and loopback device recently.

After connect to TCMU successfully, I immediately notice that the cmd
ring and data ring was filled very quickly. I am using TCMU as backend
and loopback device as frontend.

If I understand right, hw_max_sectors should be the maximum sectors(in
512 or in block size? more likely in block size) one SCSI command can
use, and hw_queue_depth is the maximum number of commands that
hardware can queued at any given moment.

This all is to support MAXIMUM TRANSFER LENGTH field in VPD page b0. The spec says it's supposed to be in logical blocks (naturally), but from looking at target_core_spc.c line 527 we are returning hw_max_sectors except for if qla2xxx. This is a bug. We should be dividing it by dev_attrib.block_size, which we can assume is in sectors-per-block.

Both parameter would affect
the size of ring buffer in TCMU, since I assume ring buffer should
able to hold all the commands/data sent from block layer.

Well, maybe. Using queue_depth seems to assume there is some hardware resource that is used per request, but TCMU's data area is a resource that may only fit two giant requests, or may fit 1024 tiny requests. queue_depth is a rate-control method between the initiator and target. Ring full is also a rate-control method. Ring full seems preferable, because it accurately stops processing when the resource (the tcmu ring buffer) actually is full. It also is within a single machine instead of across a fabric, so latency to resume processing will be less, I'd think.

So although I don't think it makes sense to size the ring buffer as queue_depth * max_request_size, we can certainly make the ring buffer larger. A larger buffer will allow more in-flight cmds and data but if the userspace handler is actually slower than the requests then it doesn't actually improve IOPs, you just get bufferbloat. The downside, at least currently, is wasting memory if the userspace handler *can* keep up with requests that arrive, plus the context switch.

Proper sizing of the ring buffer, or potentially dynamically sizing it (the mmaped area is fixed-size but we need not actually back the entire area with pages) is something the initial TCMU implementation did not address, but something we may want to look at now, especially now that there is starting to be some real-world usage to base decisions on.

My questions are:

1. Does hw_max_sectors and hw_queue_depth meant to be used to control
the max_sectors and queue_depth of the device? If so, how they would
work? And do we have a way to configure them in userspace if they
works? It seems they have to be verified not beyond the ring buffer
size.

From what I can tell, hw_max_sectors is useful to set so hardware limitations are not exceeded. The only reason I could see TCMU needing it was to avoid one request larger than the entire ring buffer size.

I see queue depth being used for sizing the iscsi cmdsn window, but don't see how it is relevant for TCMU.

2. Or it's the loopback device we should really configure in
userspace? I can configure queue_depth in userspace through
*/sys/bus/scsi/devices/<SCSI device>/queue_depth*, though I don't know
how to configure max_sectors yet. Probably try
"/sys/block/xxx/queue/max_sectors_kb", but I don't know if that's
expected.

Please correct me if I got anything wrong. I really hope to get this
piece of code work flawlessly, and would be happy to contribute. But I
am not sure what's the correct way to fix it.

Just to repeat what I said above, I think loopback seeing TCMU ring full is ok. It doesn't make sense to have a restrictive queue_depth because TCMU doesn't have a number-of-cmds limitation, it has a total-data-size-of-cmds limitation.

Regards -- Andy

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux