Re: CPU soft lockup during iscsi connection close with 3.18.16

Vedavyas Duggirala <vduggira@xxxxxxxxx> · Thu, 17 Dec 2015 11:37:54 -0500

Hi,

Thanks for your response,

> Seeing these occasionally on a loaded system is not unusual.  Seeing
> them ongoing at multiple times a second as in your logs below, means
> your backend is having trouble completing I/Os within that 5 second
> latency requirement for ESX iSCSI.
>
> As mentioned in the thread above, you can try reducing the
> default_cmdsn_depth or NodeACL cmdsn_depth on the target side, to reduce
> the number of active I/Os each initiator can keep in flight in parallel.
>
> Since your using such a larger number of ESX hosts (~20) connected to a
> single target, the default_cmdsn_depth = 64 value is certainly too high
> for you.

I have checked the backend storage configuration. The machine is
serving luns off SSDs. I verified our stats,
there was not  a whole lot of IO happening at that time (20-100 IOPS).
In fact, other that ESX  health checks and heartbeats,
nothing  was writing to the disk.

Disabling ATS for heartbeats on that lun, seems to help stability. It
has been running fine without issues for three weeks,
as opposed to a hang every week.

I have the same issue happen on a smaller 8 node ESX cluster. There
too disabling ATS for heartbeat helps.

These results are without reducing default_cmdsn_depth. I will try
that out in the next maintenance window.

-Vyas
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html