On Tue, Apr 26, 2016 at 11:24 PM, Dan Lane <dracodan@xxxxxxxxx> wrote: > > On Apr 24, 2016 11:50 PM, "Dan Lane" <dracodan@xxxxxxxxx> wrote: >> >> Nick, >> I tried the raising the timeouts (to double the original) and also >> increasing/decreasing the queue depth, neither seemed to make a >> difference, although at this point the infrequency of the crashes has >> made troubleshooting quite challenging. >> >> I recently had an idea though - I've always felt like it was the >> target giving up, not the ESXi hosts, but I couldn't think of a good >> way to prove this before. It occurred to me that I could test with a >> subset of hosts powered on until the target became unavailable, then >> power those hosts off before powering on a different host that would >> be unaware of the other hosts deciding the link was unavailable. The >> result was that the host that I powered on alone did NOT see the >> storage! Once I rebooted the target server the host in question could >> see the storage. If you can think of a way that the information about >> the target being unusable would be passed to the host that was only >> powered on later (other than vCenter, which was powered off) I would >> love to hear it. I honestly may be missing something here, but I >> can't think of anything that would cause this other than the storage >> server failing. >> >> Thanks, >> Dan >> >> On Thu, Mar 31, 2016 at 2:48 AM, Nicholas A. Bellinger >> <nab@xxxxxxxxxxxxxxx> wrote: >> > On Wed, 2016-03-30 at 15:01 -0400, Dan Lane wrote: >> >> Nicholas, >> >> Can you please take another look at the fiber channel code and >> >> see if you can find the cause of the problems with ESXi/VAAI? >> > >> > There are two options at this point to get to root cause of why your ESX >> > hosts are continuously generating I/O timeouts and subsequent >> > ABORT_TASKs, even when the host is idle. >> > >> > First, have you looked into changing the ESX host FC LLD side timeout >> > and queuedepth that I asked about earlier here..? >> > >> > http://www.spinics.net/lists/target-devel/msg11844.html >> > >> > What effect does reducing the queue depth and increasing the timeout >> > have on the rate in which ABORT_TASKs are being generated..? >> > >> > Beyond that, you'll need to engage directly the QLogic folks (CC'ed) to >> > generate a firmware dump on both host and target sides for them to >> > analyze to figure out what's going on. >> > >> > Are you using QLogic HBAs on the ESX host side as well..? >> > > > Resending this in case it was ignored because I got excited and forgot to > correct Google's automatic top posting that it does... Yep, totally blaming > Google there! > > Nick, > I tried the raising the timeouts (to double the original) and also > increasing/decreasing the queue depth, neither seemed to make a difference, > although at this point the infrequency of the crashes has made > troubleshooting quite challenging. > > I recently had an idea though - I've always felt like it was the target > giving up, not the ESXi hosts, but I couldn't think of a good way to prove > this before. It occurred to me that I could test with a subset of hosts > powered on until the target became unavailable, then power those hosts off > before powering on a different host that would be unaware of the other hosts > deciding the link was unavailable. The result was that the host that I > powered on alone did NOT see the storage! Once I rebooted the target server > the host in question could see the storage. If you can think of a way that > the information about the target being unusable would be passed to the host > that was only powered on later (other than vCenter, which was powered off) I > would love to hear it. I honestly may be missing something here, but I > can't think of anything that would cause this other than the storage server > failing. > > Thanks, > Dan And then my phone decides to convert it to HTML, causing it to get dropped... ugh -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html