Re: [PATCH 00/18] ALUA device handler update, part 1

Bart Van Assche <bart.vanassche@xxxxxxxxxxx> · Fri, 20 Nov 2015 14:58:18 -0800

On 11/20/2015 02:52 AM, Hannes Reinecke wrote:
One thing, though: I don't really agree with Barts objection that
moving to a workqueue would tie in too many resources.
Thing is, I'm not convinces that using a work queue is allocating
too many resources (we're speaking of 460 vs 240 bytes here).
Also we have to retry commands for quite some time (cite the
infamous NetApp takeover/giveback, which can take minutes).
If we were to handle that without workqueue we'd have to initiate
the retry from the end_io callback, causing a quite deep stack
recursion. Which I'm not really fond of.

Hello Hannes,

Sorry if I wasn't clear enough in my previous e-mail about this topic 
but I'm more concerned about the additional memory needed for thread 
stacks and thread control data structures than about the additional 
memory needed for the workqueue. I'd like to see the ALUA device handler 
implementation scale to thousands of LUNs and target port groups. In 
case all connections between an initiator and a target port group fail, 
with a synchronous implementation of STPG we will either need a large 
number of threads (in case of one thread per STPG command) or the STPG 
commands will be serialized (if there are fewer threads than portal 
groups). Neither alternative looks attractive to me.

BTW, not all storage arrays need STPG retries. Some arrays are able to 
process an STPG command quickly (this means within a few seconds).

A previous discussion about this topic is available e.g. at 
http://thread.gmane.org/gmane.linux.scsi/105340/focus=105601.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html