Re: [dm-devel] multipath_busy() stalls IO due to scsi_host_is_busy()

Bernd Schubert <bernd.schubert@xxxxxxxxxxxxxxxxxx> · Wed, 16 May 2012 17:54:45 +0200

On 05/16/2012 05:27 PM, Mike Christie wrote:
On 05/16/2012 09:29 AM, Bernd Schubert wrote:
On 05/16/2012 04:06 PM, James Bottomley wrote:
On Wed, 2012-05-16 at 14:28 +0200, Bernd Schubert wrote:
shost->can_queue ->   62 here
shost->host_busy ->   62 when one of the multipath groups does IO,
further
multipath groups then seem to get stalled.

I'm not sure yet why multipath_busy() does not stall IO when there is a
passive path in the prio group.

Any idea how to properly address this problem?

shost->can_queue is supposed to represent the maximum number of possible
outstanding commands per HBA (i.e. the HBA hardware limit).  Assuming
the driver got it right, the only way of increasing this is to buy a
better HBA.

HBA is a mellanox IB adapter. I have not checked yet where the limit of

What driver is this with? SRP or iSER or something else?

Its SRP. The command queue limit comes from SRP_RQ_SIZE. The value seems 
a bit low, IMHO. And its definitely lower than needed for optimal 
performance. However, given that I get good performance when 
multipath_busy() is a noop, I think this is the primary issue here. And 
it is always possible that a single LUN could use all command queues. 
Other LUNs still shouldn't be stalled completely.

So in summary we actually have two issues:

1) Unfair queuing/waiting of dm-mpath, which stalls an entire path and 
brings down overall performance.

2) Low SRP command queues. Is there a reason why 
SRP_RQ_SHIFT/SRP_RQ_SIZE and their depend values such as SRP_RQ_SIZE are 
so small?

Thanks,
Bernd

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html