Re: System reboots after insertion and removal of disks in 2.6.18 kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for your prompt reply Tejun.

On Sat, Mar 29, 2008 at 7:20 PM, Tejun Heo <htejun@xxxxxxxxx> wrote:
> Hello,
>
> Sagar Borikar wrote:

> > transition as well. So I get messages like COMRESET Failed and hard
> > reset failed. This doesn't happen if I insert back the drive
> > immediately. The system immediately recovers.
>
> Okay, that was one long paragraph.  :-)
>
> The behavior itself (sans triggering machine reset) is intended.  libata
> EH doesn't rely on the edge events (PHY status changed).  It relies on
> level state (PHY readiness) and as long as at least one PHY event is
> triggered after link status has changed, it doesn't care what polarity
> those events are or how many of them are.  That was the design decision
> made for robustness.

I understand. But the issue is, if I insert another drive, that event
gets detected. Only remove
event is not getting detected. So I was wondering if somehow I am able
to make the remove events detected, I can go ahead.Also digging
further in the code.
As expected in insert_remove action, the dev->class becomes
ATA_UNKNOWN and hence the ata_eh_revalidate_and_attach function
doesn't execute the following if condition
"action & ATA_EH_REVALIDATE && ata_dev_ready"
I am attaching two logs with remove and insert_remove files which
indicates the flow of the sequence in these two paths. IF you could
browse through that, would be great.

> > ata2: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0x2 frozen
> > ata2: hard resetting port
> > ata2: port is slow to respond, please be patient
> > ata2: port failed to respond (30 secs)  ---------------------> At this
> > state, actually the drive is removed. But not detected.
> > ata2: COMRESET failed (device not ready)
> > ata2: hardreset failed, retrying in 5 secs
> > ata2: hard resetting port
> > ata2: SATA link down (SStatus 0 SControl 310)
> > ata2: EH complete
>
> This is a quite old kernel, right?  Recent ones take much shorter to
> detect the condition.

That's right. It is 2.6.18

> > PMON2000 MIPS Initializing. Standby...
> > ERRORPC=bfc00004 CONFIG=0042e4bb STATUS=00400000
> > CPU PRID 000034c1, MaskID 00001320
> > Initializing caches...done (CONFIG=0042e4bb)
> > Switching to runtime address map...done
> > Setting up SDRAM controller: sdram config 0x80010000
> > master clock 100 Mhz, MulFundBIU 0x02, DivXSDRAM 0x02
> > sdram freq 0x09ef21aa hz, sdram period: 0x06 nsec
> > dimm0: density 256Mbit, width 16, single-sided, unbuffered, size
> > 0x08000000
> >  supported CAS latency: 2.5 2, using 2.5 cycles, byte18=0x0c
> >  RAS to CAS delay (tRCD) 0x12 nsec, byte29=0x
>
> Okay, and the machine got reboot.  It's weird that the reset happens
> *after* EH is complete.  After EH complete is printed, libata won't
> touch the hardware.  I'm sorry but I don't have any clue why the machine
> is getting rebooted.  Does the machine reset on oops?

Also it happens after say 1 to 1.5 minutes. If I insert drive within
this duration, reset doesn't happen. Also if I insert in any other
slot, reset doesn't happen. Only after immediate removal of the disk,
the reset happens.
Is there any way by which I can make the insertion event edge triggered?

Thanks in advance
Sagar
> --
> tejun
>
[root@NAS00180001310e ~]# sil_host_intr:1
sil_host_intr:4 ata_port_freeze
sil_freeze start
sil_freeze end
ata1 port frozen
ata_bmdma_error_handler: start
ata_do_eh : ata_eh_autopsy
ata_eh_autopsy : start
ata_do_eh : ata_eh_report
ata1: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0x2 frozen
ata_do_eh : ata_eh_recover
ata_eh_detach_dev : start
ata_eh_detach_dev : ata_eh_prep_resume
ata_eh_detach_dev : ata_eh_skip_recovery
sil_freeze start
sil_freeze end
ata1 port frozen
ata_eh_detach_dev : ata_eh_reset
ata_std_prereset: start
ata1: hard resetting port
ata_port_offline : port is offline
sata_print_link_status:start
ata1: SATA link down (SStatus 0 SControl 310)
sata_print_link_status:end
ata_eh_detach_dev : ata_eh_thaw_port
ata_eh_detach_dev : ata_eh_revalidate_and_attach
ata_port_offline : port is offline
ata1: failed to recover some devices, retrying in 5 secs
ata_eh_detach_dev : ata_eh_prep_resume
ata_eh_detach_dev : ata_eh_skip_recovery
sil_freeze start
sil_freeze end
ata1 port frozen
ata_eh_detach_dev : ata_eh_reset
ata_std_prereset: start
ata1: hard resetting port
lcdout: error response: ng
ata_port_offline : port is offline
sata_print_link_status:start
ata1: SATA link down (SStatus 0 SControl 310)
sata_print_link_status:end
ata_eh_detach_dev : ata_eh_thaw_port
ata_eh_detach_dev : ata_eh_revalidate_and_attach
ata_port_offline : port is offline
ata1: failed to recover some devices, retrying in 5 secs
ata_eh_detach_dev : ata_eh_prep_resume
ata_eh_detach_dev : ata_eh_skip_recovery
sil_freeze start
sil_freeze end
ata1 port frozen
ata_eh_detach_dev : ata_eh_reset
ata_std_prereset: start
ata1: hard resetting port
ata_port_offline : port is offline
sata_print_link_status:start
ata1: SATA link down (SStatus 0 SControl 310)
sata_print_link_status:end
ata_eh_detach_dev : ata_eh_thaw_port
ata_eh_detach_dev : ata_eh_revalidate_and_attach
ata_port_offline : port is offline
ata1.00: disabled
ata_port_offline : port is offline
ata_eh_detach_dev : start
ata_eh_detach_dev : start
ata_eh_detach_dev : ata_eh_prep_resume
ata_eh_detach_dev : ata_eh_skip_recovery
ata_eh_detach_dev : ata_eh_revalidate_and_attach
ata_eh_detach_dev : ata_eh_resume
ata_eh_detach_dev : ata_eh_suspend
ata_do_eh : ata_eh_finish
ata_eh_finish : start
ata_bmdma_error_handler: end
ata1: EH complete
ata1.00: detaching (SCSI 0:0:0:0)
[root@NAS00180001310e ~]# Drive 1 inserted
sil_host_intr:1
sil_host_intr:4 ata_port_freeze
sil_freeze start
sil_freeze end
ata1 port frozen
ata_bmdma_error_handler: start
ata_do_eh : ata_eh_autopsy
ata_eh_autopsy : start
ata_do_eh : ata_eh_report
ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0x2 frozen
ata_do_eh : ata_eh_recover
ata_eh_detach_dev : start
ata_eh_detach_dev : ata_eh_prep_resume
ata_eh_detach_dev : ata_eh_skip_recovery
sil_freeze start
sil_freeze end
ata1 port frozen
ata_eh_detach_dev : ata_eh_reset
ata_std_prereset: start
ata1: hard resetting port
ata_port_offline : port is offline
sata_print_link_status:start
ata1: SATA link down (SStatus 0 SControl 310)
sata_print_link_status:end
ata_eh_detach_dev : ata_eh_thaw_port
ata_eh_detach_dev : ata_eh_revalidate_and_attach
ata_eh_detach_dev : ata_eh_resume
ata_eh_detach_dev : ata_eh_suspend
ata_do_eh : ata_eh_finish
ata_eh_finish : start
ata_bmdma_error_handler: end
ata1: EH complete

[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux