Re: Mitigating excessive iSCSI initiator connection failed errors.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/03/16 04:25, Nicholas A. Bellinger wrote:
On Fri, 2016-03-11 at 12:21 +0000, Benjamin ESTRABAUD wrote:
On 11/03/16 11:45, Benjamin ESTRABAUD wrote:
On 10/03/16 21:41, Mike Christie wrote:
On 03/08/2016 12:57 PM, Benjamin ESTRABAUD wrote:
Hi,

I've been facing with a small issue lately when working with 128 iSCSI
targets on a single system with multiple iscsi initiators connecting to
it (3, 4 inits). If I remove the iSCSI targets or even just the ACLs
from the system even temporarily I get flooded by thousands upon
thousands of connection failures from the hosts trying to login to the
system with messages like:

[  923.560908] iSCSI Initiator Node: iqn.1994-05.com.redhat:c87d91366225
is not authorized to access iSCSI target portal group: 1.
[  923.561124] iSCSI Login negotiation failed.

These are fine in small number but when all of the hosts are combined to
so many targets the system gets overwhelmed, the kernel logger gets
flooded and the network portal thread's CPU usage ramps up to close to
100%.

Is there a way to limit those, say for instance add a timer between the
login attempt processing that gradually increases with each login
failure?

I can control the hosts but it's not always evident as a solution
(sometimes the initiators have been improperly configured and I get back
in the same situation).


Is this with Linux hosts only? We can implement something on the
initiator side like other OSes have where for these types of errors it
will stop retrying the relogin then the user has to relogin manually
later. We used to do that by default, but hit issues. We can just make
the behavior a config option.
So far yes, I've only seen this with a RHEL 7.1 host with Multipath
enabled on all the iSCSI LUNs. A user would take the RAID offline on the
target, causing the associated Volume and LIO targets to be taken
offline. Because this an symmetrical dual controller system one path of
the multipath device becomes unresponsive (IOs submitted to it will
timeout) and the other path to the side which hosted the RAID will
return iSCSI login errors. It seems that iscsid or IOs submitted to the
iSCSI device by multipathd (path checker IOs, application IOs etc.)
trigger a login request to the target very frequently. Since we have
between 64 and 128 targets it ends up generating a DoS.

Having this behaviour back as a config option would greatly help, as in
this particular example automatic recovery is unlikely (somebody took
the storage offline on the target side, it may not come back).

Actually, tweaking "node.session.iscsi.DefaultTime2Wait = 2" seems to
help a lot.

DefaultTime2Wait controls the delay of initiator session + connection
reconnect.  Note this parameter is negotiated during login, and can also
be set on a per TargetName+TargetPortalGroupTag context.

  Changing this from "2" to "10" caused the iscsi session on
"yanked" targets to only attempt to reconnect five time less, which is
more or less what I was looking for (mitigating the number of connection
attempts). What you were talking about was a way for the iscsi initiator
to eventually give up, or to change the frequency at which the initiator
would retry?

node.session.timeo.replacement_timeout controls the timeout for
completing outstanding I/O (with error status) from open-iscsi back to
scsi-core, separate from iscsi session reconnect.
Thanks for that, I'll increase this timeout as well.



--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux