Re: Unexpected issues with 2 NVME initiators using the same target

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




I don't understand, is this new with the patch applied?

I applied your patch to 4.12-rc6 on the initiator, but my targets are
still 4.9.33 since it looked like the patch only affected the
initiator. I did not see this before your patch, but I also didn't try
rebooting the targets multiple times before because of the previous
messages.

That sounds like a separate issue. Should we move forward with the
suggested patch?

After this and a reboot of the target, the initiator would drop the
connection after 1.5-2 minutes then faster and faster until it was
every 5 seconds. It is almost like it set up the connection then lose
the first ping, or the ping wasn't set-up right. I tried rebooting the
target multiple times.


So the initiator could not recover even after the target as available
again?

The initiator recovered the connection when the target came back, but
the connection was not stable. I/O would happen on the connection,
then it would get shaky and then finally disconnect. Then it would
reconnect, pass more I/O, then get shaky and go down again. With the 5
second disconnects, it would pass traffic for 5 seconds, then as soon
as I saw the ping timeout, the I/O would stop until it reconnected. At
that point it seems that the lack of pings would kill the I/O unlike
earlier where there was a stall in I/O and then the connection would
be torn down. I can try to see if I can get it to happen again.

So looks like the target is not responding to NOOP_OUTs (or traffic
at all for that matter).

The messages:
[Tue Jun 20 10:11:20 2017] iSCSI Login timeout on Network Portal [::]:3260

Are indicating that something is stuck in the login thread, not sure
where though. Did you see a watchdog popping on a hang?

And massage:
[Tue Jun 20 10:11:58 2017] isert: isert_print_wc: login send failure:
transport retry counter exceeded (12) vend_err 81

Is an indication that the rdma fabric is in some error state.

On which reboot attempt all this happened? the first one?

Again, CCing target-devel.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux