> -----Original Message----- > From: Richard Weinberger [mailto:richard.weinberger@xxxxxxxxx] > Sent: Saturday, July 12, 2014 9:17 AM > To: KY Srinivasan > Cc: Christoph Hellwig; linux-kernel@xxxxxxxxxxxxxxx; > devel@xxxxxxxxxxxxxxxxxxxxxx; ohering@xxxxxxxx; > jbottomley@xxxxxxxxxxxxx; jasowang@xxxxxxxxxx; apw@xxxxxxxxxxxxx; > linux-scsi@xxxxxxxxxxxxxxx > Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler > > On Thu, Jul 10, 2014 at 12:33 PM, Richard Weinberger > <richard.weinberger@xxxxxxxxx> wrote: > > On Wed, Jul 9, 2014 at 8:51 PM, KY Srinivasan <kys@xxxxxxxxxxxxx> wrote: > >> > >> > >>> -----Original Message----- > >>> From: Christoph Hellwig [mailto:hch@xxxxxxxxxxxxx] > >>> Sent: Wednesday, July 9, 2014 1:44 AM > >>> To: KY Srinivasan > >>> Cc: linux-kernel@xxxxxxxxxxxxxxx; devel@xxxxxxxxxxxxxxxxxxxxxx; > >>> ohering@xxxxxxxx; jbottomley@xxxxxxxxxxxxx; jasowang@xxxxxxxxxx; > >>> apw@xxxxxxxxxxxxx; linux-scsi@xxxxxxxxxxxxxxx > >>> Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort > >>> handler > >>> > >>> On Tue, Jul 08, 2014 at 05:46:50PM -0700, K. Y. Srinivasan wrote: > >>> > Implement a simple abort handler. The host does not support > >>> > "Abort"; just ensure that all inflight I/Os have been accounted for. > >>> > >>> The abort handler should abort a single command, not wait for all of > them. > >>> What issue do you see that this tries to address? > >> > >> On Azure, we sometimes have unbounded I/O latencies and some > >> distributions (such as SLES12) based on recent kernels are invoking the > "Abort Handler". Unfortunately, our scsi emulation on the host does not > support aborting a command. > >> The issue I have seen is that the upper level scsi code attempts error > recovery when the command times out and finally frees up the command. > >> The host subsequently responds to the command that has timed out and > >> since the memory has been freed up, we end up touching freed memory > >> in this driver. Since the host is also doing error recovery, by just delaying > the error handler in the guest until we can account for all the in-flight > commands, we can get around the problem. > > > > I see strange issues in Azure and maybe they are related to this. > > Some Linux machines crash in a way that no disk IO is possible (thus, > > no SSH for me) but they still respond to ping. It happens rather > > seldom (every few weeks). > > > > Do you see similar symptoms? > > ping? Sorry for the delayed response. Yes we have seen resets and potentially the file system mounted Read-only because of the I/O timeouts. We have increased the standard scsi timeouts. Implementing the Timedout handler as we have done now should solve this problem. K. Y > > -- > Thanks, > //richard ��.n��������+%������w��{.n�����{������ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f