Re: Mitigating excessive iSCSI initiator connection failed errors.

Benjamin ESTRABAUD <be@xxxxxxxxxx> · Fri, 11 Mar 2016 11:45:22 +0000

On 10/03/16 21:41, Mike Christie wrote:
On 03/08/2016 12:57 PM, Benjamin ESTRABAUD wrote:
Hi,

I've been facing with a small issue lately when working with 128 iSCSI
targets on a single system with multiple iscsi initiators connecting to
it (3, 4 inits). If I remove the iSCSI targets or even just the ACLs
from the system even temporarily I get flooded by thousands upon
thousands of connection failures from the hosts trying to login to the
system with messages like:

[  923.560908] iSCSI Initiator Node: iqn.1994-05.com.redhat:c87d91366225
is not authorized to access iSCSI target portal group: 1.
[  923.561124] iSCSI Login negotiation failed.

These are fine in small number but when all of the hosts are combined to
so many targets the system gets overwhelmed, the kernel logger gets
flooded and the network portal thread's CPU usage ramps up to close to
100%.

Is there a way to limit those, say for instance add a timer between the
login attempt processing that gradually increases with each login failure?

I can control the hosts but it's not always evident as a solution
(sometimes the initiators have been improperly configured and I get back
in the same situation).

Is this with Linux hosts only? We can implement something on the
initiator side like other OSes have where for these types of errors it
will stop retrying the relogin then the user has to relogin manually
later. We used to do that by default, but hit issues. We can just make
the behavior a config option.
So far yes, I've only seen this with a RHEL 7.1 host with Multipath 
enabled on all the iSCSI LUNs. A user would take the RAID offline on the 
target, causing the associated Volume and LIO targets to be taken 
offline. Because this an symmetrical dual controller system one path of 
the multipath device becomes unresponsive (IOs submitted to it will 
timeout) and the other path to the side which hosted the RAID will 
return iSCSI login errors. It seems that iscsid or IOs submitted to the 
iSCSI device by multipathd (path checker IOs, application IOs etc.) 
trigger a login request to the target very frequently. Since we have 
between 64 and 128 targets it ends up generating a DoS.

Having this behaviour back as a config option would greatly help, as in 
this particular example automatic recovery is unlikely (somebody took 
the storage offline on the target side, it may not come back).

Thanks!

Regards,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html