Re: Ring buffer in TCMU, and queue depth/max sectors in Loopback device

Andy Grover <agrover@xxxxxxxxxx> · Wed, 13 Jan 2016 11:00:55 -0800

On 01/12/2016 05:11 PM, Sheng Yang wrote:
I am currently working on a project using TCMU in production, and got
an issue related to both TCMU and loopback device recently.

After connect to TCMU successfully, I immediately notice that the cmd
ring and data ring was filled very quickly. I am using TCMU as backend
and loopback device as frontend.

If I understand right, hw_max_sectors should be the maximum sectors(in
512 or in block size? more likely in block size) one SCSI command can
use, and hw_queue_depth is the maximum number of commands that
hardware can queued at any given moment.

This all is to support MAXIMUM TRANSFER LENGTH field in VPD page b0. The 
spec says it's supposed to be in logical blocks (naturally), but from 
looking at target_core_spc.c line 527 we are returning hw_max_sectors 
except for if qla2xxx. This is a bug. We should be dividing it by 
dev_attrib.block_size, which we can assume is in sectors-per-block.

Both parameter would affect
the size of ring buffer in TCMU, since I assume ring buffer should
able to hold all the commands/data sent from block layer.

Well, maybe. Using queue_depth seems to assume there is some hardware 
resource that is used per request, but TCMU's data area is a resource 
that may only fit two giant requests, or may fit 1024 tiny requests. 
queue_depth is a rate-control method between the initiator and target. 
Ring full is also a rate-control method. Ring full seems preferable, 
because it  accurately stops processing when the resource (the tcmu ring 
buffer) actually is full. It also is within a single machine instead of 
across a fabric, so latency to resume processing will be less, I'd think.

So although I don't think it makes sense to size the ring buffer as 
queue_depth * max_request_size, we can certainly make the ring buffer 
larger. A larger buffer will allow more in-flight cmds and data but if 
the userspace handler is actually slower than the requests then it 
doesn't actually improve IOPs, you just get bufferbloat. The downside, 
at least currently, is wasting memory if the userspace handler *can* 
keep up with requests that arrive, plus the context switch.

Proper sizing of the ring buffer, or potentially dynamically sizing it 
(the mmaped area is fixed-size but we need not actually back the entire 
area with pages) is something the initial TCMU implementation did not 
address, but something we may want to look at now, especially now that 
there is starting to be some real-world usage to base decisions on.

My questions are:

1. Does hw_max_sectors and hw_queue_depth meant to be used to control
the max_sectors and queue_depth of the device? If so, how they would
work? And do we have a way to configure them in userspace if they
works? It seems they have to be verified not beyond the ring buffer
size.

From what I can tell, hw_max_sectors is useful to set so hardware 
limitations are not exceeded. The only reason I could see TCMU needing 
it was to avoid one request larger than the entire ring buffer size.

I see queue depth being used for sizing the iscsi cmdsn window, but 
don't see how it is relevant for TCMU.

2. Or it's the loopback device we should really configure in
userspace? I can configure queue_depth in userspace through
*/sys/bus/scsi/devices/<SCSI device>/queue_depth*, though I don't know
how to configure max_sectors yet. Probably try
"/sys/block/xxx/queue/max_sectors_kb", but I don't know if that's
expected.

Please correct me if I got anything wrong. I really hope to get this
piece of code work flawlessly, and would be happy to contribute. But I
am not sure what's the correct way to fix it.

Just to repeat what I said above, I think loopback seeing TCMU ring full 
is ok. It doesn't make sense to have a restrictive queue_depth because 
TCMU doesn't have a number-of-cmds limitation, it has a 
total-data-size-of-cmds limitation.

Regards -- Andy

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html