Re: Fw: [Bugme-new] [Bug 5998] New: oops on mount "kernel access of bad area, sig: 11 [#1]"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andrew Morton wrote:
Begin forwarded message:

Date: Thu, 2 Feb 2006 15:03:45 -0800
From: bugme-daemon@xxxxxxxxxxxxxxxxxxx
To: bugme-new@xxxxxxxxxxxxxx
Subject: [Bugme-new] [Bug 5998] New: oops on mount "kernel access of bad area, sig: 11 [#1]"


http://bugzilla.kernel.org/show_bug.cgi?id=5998

           Summary: oops on mount "kernel access of bad area, sig: 11 [#1]"
    Kernel Version: 2.6.15-rc5
            Status: NEW
          Severity: normal
             Owner: drivers_ieee1394@xxxxxxxxxxxxxxxxxxxx
         Submitter: johnstul@xxxxxxxxxx


Most recent kernel where this bug did not occur: unknown
Distribution: Ubuntu
Hardware Environment: ppc32 Apple Mac mini
Problem Description:

After over 10 days of uptime, mounting and unmounting my external firewire
hardddrive for backups, I got the following OOPs today when trying to mount the
drive.

I am Cc'ing linux-scsi because it is not entirely clear (to me) whether sbp2 or upper layers may cause it. Although I suspect sbp2's interaction with the scsi core to be the culprit again.

I've seen problems where the cable gets bumped loose and I'll see
something similar, however I have not been able to verify if the cable was
secure when this occured.  The mount command is still hung, but the box seems to
be running fine.

When the cable is pulled during I/O, sbp2 did not take care to finish SCSI commands that were enqueued right before the cable pull. (Other hardware problems may have the same effect.) I discovered this problem when I still ran Linux 2.6.14, there it simply lead to knodemgrd being stuck in uninteruptible sleep in blk_execute_rq(). I have not checked yet how Linux 2.6.15 or other configurations than preemptible i386 uniprocessor would react on this. A fix for this is making its way downstream. (Upstream?) http://www.kernel.org/git/?p=linux/kernel/git/scjody/ieee1394.git;a=commitdiff;h=61daa34c132c5d4ed8630e2c46e9bf2f0c7b3428 I don't know if the patch alone can be applied to 2.6.15, but this patch set can: http://me.in-berlin.de/~s5r6/linux1394/updates/

ieee1394: sbp2: aborting sbp2 command
sd 0:0:0:0: command: cdb[0]=0x28: 28 00 09 27 c0 03 00 00 02 00
ieee1394: sbp2: aborting sbp2 command
sd 0:0:0:0: command: cdb[0]=0x0: 00 00 00 00 00 00
Oops: kernel access of bad area, sig: 11 [#1]
PREEMPT NIP: C0023FF8 LR: C0023FF8 SP: EFC0FEB0 REGS: efc0fe00 TRAP: 0300 Not tainted
MSR: 00001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: 00000000, DSISR: 40000000
TASK = c134b230[317] 'scsi_eh_0' THREAD: efc0e000
Last syscall: -1 GPR00: 00000000 EFC0FEB0 C134B230 00000001 C05BDE28 FFFFFFFF C0650000 C0664E84 GPR08: 00040000 00000001 C13DF400 EFC0E000 C0650000 00000000 00000000 00000000 GPR16: 00000000 C02EF5A0 C06070EC C0586F7C C06070EC C0586F7C C06070EC C0650000 GPR24: 00000003 C13EE204 C13EE268 00000001 00009032 00000000 C13EE1C0 C02EF440 NIP [c0023ff8] complete+0x28/0x90
LR [c0023ff8] complete+0x28/0x90
Call trace:
 [c02ef45c] scsi_eh_done+0x1c/0x30
 [c03248b8] sbp2scsi_abort+0x158/0x170
 [c02efbcc] scsi_send_eh_cmnd+0x10c/0x1a0
 [c02efce8] scsi_eh_tur+0x88/0xe0
 [c02f0930] scsi_error_handler+0x450/0xa10
 [c00437e8] kthread+0x108/0x110
 [c0007534] kernel_thread+0x44/0x60
note: scsi_eh_0[317] exited with preempt_count 1


Steps to reproduce: Not easily reproduced.

It looks actually different from what I described above. Perhaps scsi_eh_done was called on a command which was already completed shortly before. sbp2scsi_abort calls the "done" handler for the command to be aborted and for all other pending commands (for the latter to be enqueued again). Sbp2 also calls the done handler right after a SBP-2 reconnect in sbp2_update, i.e. after the FireWire bus was reset. Perhaps one or the other or both places in sbp2 need to take the Scsi_Host's host_lock.

John, do you have the syslog still available from when the oops occurred? Was there a reconnect logged?

This report from October may be related: "slab error in cache_free_debugcheck(): cache `sgpool-8" http://marc.theaimsgroup.com/?t=112931959700002 Although it appeared (to the reporter) as if this problem was fixed lately. Maybe it was only masked out by an unrelated change.
--
Stefan Richter
-=====-=-==- --=- ---==
http://arcgraph.de/sr/
-
: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux