RE: [isci PATCH v3 3/4] libsas: suspend / resume support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dan,

I found problem about SATA error handle and this also led to panic when run suspend/resume test. Thanks again for your share debug method and patch of testing suspend/resume problem.

The problem is:

When probe libata probe sata will try to reset device and send IDENTIFY DEVICE.. commands when 3 times fail will give up but never tell libsas and lldd to clean up the resource. And if some time later the device came up report BROADCAST CHANGE and libsas found device try to  tell lldd, but lldd is not forget the original one will reject this. 
Should we export sas_unregister_devs_sas_addr let ata call this when give up.


For suspend/resume here I've seen Hard LOCKUP when this kind of things happen.
Will look into the problem deeper to see the root cause.

************************************************************************

Jack Wang 王金浦 Software Engineer

RD2 (Storage) 

Department of Storage Products R&D, SCS

Universal Scientific Industrial (Shanghai) Co., Ltd

421 Lishizhen Rd, Pudong New Area, Shanghai, 

P.R. China, 201203

上海市浦东新区李时珍路421号    邮编: 201203

Tel :  +86 -21-5896 6996  ext. 81526

Fax : +86 -21-5080 4268

http://www.usi.com.tw  MAIL: jack_wang@xxxxxxxxx

************************************************************************

 
> -----邮件原件-----
> 发件人: linux-scsi-owner@xxxxxxxxxxxxxxx
> [mailto:linux-scsi-owner@xxxxxxxxxxxxxxx] 代表 Williams, Dan J
> 发送时间: 2012年3月16日 6:16
> 收件人: Alan Stern
> 抄送: linux-scsi@xxxxxxxxxxxxxxx; linux-ide@xxxxxxxxxxxxxxx; Jacek Danecki
> 主题: Re: [isci PATCH v3 3/4] libsas: suspend / resume support
> 
> On Thu, Mar 15, 2012 at 12:54 PM, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
> > On Thu, 15 Mar 2012, Williams, Dan J wrote:
> >
> >> > Hmm... I was wondering why we needed a kernel global sync.  If this is
> >> > just for probe work what about just doing something like the following?
> >> > Keep the sync operation local to probe-work and not entangle with the
> >> > resume-work?  I'll give this a shot when I get back to my test system.
> >> >
> >> > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> >> > index bd17cf8..ec69e35 100644
> >> > --- a/drivers/scsi/sd.c
> >> > +++ b/drivers/scsi/sd.c
> >> > @@ -2629,6 +2629,8 @@ static void sd_probe_async(void *data,
> async_cookie_t cookie)
> >> >        put_device(&sdkp->dev);
> >> >  }
> >> >
> >> > +static LIST_HEAD(sd_probe_domain);
> >
> > Take a look at fea6d607e154cf96ab22254ccb48addfd43d4cb5; it might be a
> > good idea to make this domain available to scsi_bus_prepare().  For
> > example, it could made into a SCSI-wide domain, defined in the SCSI
> > core and exported for use by sd.
> 
> Nice, thanks for the pointer.  Yes, I'll up-level this.
> 
> >
> >> This works fine for me for resolving the deadlock, but I found I also
> >> needed the following to fix a spurious:
> >>
> >>    scsi 6:0:1:0: Illegal state transition deleted->running
> >>
> >> ...in the resume path.
> >>
> >> @@ -2348,10 +2349,11 @@ EXPORT_SYMBOL(scsi_device_quiesce);
> >>   *
> >>   *     Must be called with user context, may sleep.
> >>   */
> >> -void
> >> -scsi_device_resume(struct scsi_device *sdev)
> >> +void scsi_device_resume(struct scsi_device *sdev)
> >>  {
> >> -       if(scsi_device_set_state(sdev, SDEV_RUNNING))
> >> +       /* check if the device was deleted during suspend */
> >> +       if (sdev->sdev_state == SDEV_DEL ||
> >> +           scsi_device_set_state(sdev, SDEV_RUNNING))
> >>                 return;
> >>         scsi_run_queue(sdev->request_queue);
> >>  }
> >>
> >> Unless someone can point out something I'm missing I'll go ahead and
> >> roll this into it's own patch and rebase/drop the hack out of the
> >> libsas resume code.
> >
> > The device might be in some other state.  Perhaps it would be better to
> > do
> >
> >        if (sdev->sdev_state != SDEV_QUIESCE ||
> >                        scsi_device_set_state(sdev, SDEV_RUNNING))
> >
> > I'm not sure what guarantees this function is supposed to provide, but
> > the comment indicates that it's meant just to handle quiesced devices.
> >
> 
> I'm not sure either, but I can get on board with this change to say
> "the world changed when you weren't looking, assume whomever changed
> the state is taking care of the device".
> 
> --
> Dan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux