Hi Tejun, I tested 2.6.18-rc4-mm2 on i386 platform with and without your patch. The behavior is the same as rc2 with your patch. No kernel error and the only relevant dmesg traces are: [ 125.048966] ata5: soft resetting port [ 125.202410] ata5.00: configured for UDMA/100 [ 125.202492] ata5: EH complete [End of dmesg] It did not proceed to add/attach sg device! In rc4-mm2 release without your patch, I forgot to power on pata drive and add drive will cause kernel error. Please see attachment for detailed log. I can reproduce the problem easily, please feel free to let me know what kind of log/traces can be collected. Thanks, Fajun On 8/24/06, Tejun Heo <htejun@xxxxxxxxx> wrote:
[cc'ing linux-ide again] Fajun Chen wrote: > Hi Tejun, > > Your patch is successful at fixing the kernel error when adding pata > drive on 2.6.18-rc2 but there's some sg availability issue. I did the > same sequence of test: power off drive -> remove drive -> power on > drive -> add drive. From dmesg log, pata drive was added back > successfully as sg0 but somehow it is unavailable for upper level. For > instance, sg_scan didn't detect it. It is not shown in > /sys/class/scsi_generic/. Sg utilities sg_inq and sg_dd complained > about no such device for sg0. Is there any magic command I need to > issue to populate sg information up to the user space? If not, then > there is still bug somewhere. Complete log of the test sequence is > attached. > > I'll hold off the testing on 2.6.18-rc4 for now. Please let me know if > you think it still makes sense to test your patch in the new release. > Thanks for your help! The thing is that the patch shouldn't really change behavior. With the patch, libata might be more forgiving about SCSI refcnt bugs but the code before and after are identical correctness-wise. ISTR there has been SCSI ref bug fixed sometime inbetween. I can't pinpoint it though. So, please try 2.5.18-rc4 w/ or w/o the patch. Pull libata-dev#pata-drivers, that's basically current linus #master + libata-dev#upstream updates + Alan's pata-drivers. Of course, I might be wrong, but I kind of have too little lead to follow on at the moment, especially because I can't reproduce it here. If testing new kernel on your target platform takes a lot of effort, just perform preliminary tests on x86. As far as SCSI and libata are concerned, there's little difference. -- tejun
Script started on Thu 24 Aug 2006 05:22:13 PM MDT ]0;fchen@RockySnow:~/Projects/gemstar/debug[fchen@RockySnow debug]$ te[K[K!tel telnet 192.168.0.2 Trying 192.168.0.2... Connected to 192.168.0.2. Escape character is '^]'. stdin, stdout, etderr: 0 1 2 sh-2.04# dmesg [ 0.000000] Linux version 2.6.18-rc4-mm2 (fchen@RockySnow) (gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)) #1 Thu Aug 24 17:11:24 MDT 2006 [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] sanitize start [ 0.000000] sanitize end [ 0.000000] copy_e820_map() start: 0000000000000000 size: 00000000000a0000 end: 00000000000a0000 type: 1 [ 0.000000] copy_e820_map() type is E820_RAM [ 0.000000] copy_e820_map() start: 00000000000f0000 size: 0000000000010000 end: 0000000000100000 type: 2 [ 0.000000] copy_e820_map() start: 0000000000100000 size: 0000000003e00000 end: 0000000003f00000 type: 1 [ 0.000000] copy_e820_map() type is E820_RAM [ 0.000000] copy_e820_map() start: 00000000ffb00000 size: 0000000000500000 end: 0000000100000000 type: 2 [ 0.000000] BIOS-e820: 0000000000000000 - 00000000000a0000 (usable) [ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) [ 0.000000] BIOS-e820: 0000000000100000 - 0000000003f00000 (usable) [ 0.000000] BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved) [ 0.000000] 63MB LOWMEM available. [ 0.000000] On node 0 totalpages: 16128 [ 0.000000] DMA zone: 4096 pages, LIFO batch:0 [ 0.000000] Normal zone: 12032 pages, LIFO batch:1 [ 0.000000] DMI 2.2 present. [ 0.000000] Allocating PCI resources starting at 10000000 (gap: 03f00000:fbc00000) [ 0.000000] Detected 735.048 MHz processor. [ 25.962253] Built 1 zonelists. Total pages: 16128 [ 25.962259] Kernel command line: rw root=/dev/ram [ 25.962865] Enabling fast FPU save and restore... done. [ 25.962874] Enabling unmasked SIMD FPU exception support... done. [ 25.962880] Initializing CPU#0 [ 25.962961] PID hash table entries: 256 (order: 8, 1024 bytes) [ 25.964819] Console: colour VGA+ 80x25 [ 25.966538] Dentry cache hash table entries: 8192 (order: 3, 32768 bytes) [ 25.966741] Inode-cache hash table entries: 4096 (order: 2, 16384 bytes) [ 25.974315] Memory: 55496k/64512k available (2230k kernel code, 8612k reserved, 671k data, 184k init, 0k highmem) [ 25.974437] virtual kernel memory layout: [ 25.974440] fixmap : 0xffffd000 - 0xfffff000 ( 8 kB) [ 25.974444] vmalloc : 0xc4800000 - 0xffffb000 ( 951 MB) [ 25.974447] lowmem : 0xc0000000 - 0xc3f00000 ( 63 MB) [ 25.974451] .init : 0xc03d8000 - 0xc0406000 ( 184 kB) [ 25.974454] .data : 0xc032d9b4 - 0xc03d5988 ( 671 kB) [ 25.974458] .text : 0xc0100000 - 0xc032d9b4 (2230 kB) [ 25.974855] Checking if this processor honours the WP bit even in supervisor mode... Ok. [ 26.034946] Calibrating delay using timer specific routine.. 1470.69 BogoMIPS (lpj=735346) [ 26.035189] Mount-cache hash table entries: 512 [ 26.035565] CPU: After generic identify, caps: 0383f9ff 00000000 00000000 00000000 00000000 00000000 00000000 [ 26.035595] CPU: L1 I cache: 16K, L1 D cache: 16K [ 26.035686] CPU: L2 cache: 128K [ 26.035744] CPU: After all inits, caps: 0383f9ff 00000000 00000000 00000040 00000000 00000000 00000000 [ 26.035759] Intel machine check architecture supported. [ 26.035822] Intel machine check reporting enabled on CPU#0. [ 26.035904] Compat vDSO mapped to ffffe000. [ 26.036007] CPU: Intel Celeron (Coppermine) stepping 0a [ 26.036146] Checking 'hlt' instruction... OK. [ 26.040649] checking if image is initramfs...it isn't (no cpio magic); looks like an initrd [ 26.700271] Freeing initrd memory: 4672k freed [ 26.701393] NET: Registered protocol family 16 [ 26.713849] PCI: PCI BIOS revision 2.10 entry at 0xfb400, last bus=1 [ 26.713942] PCI: Using configuration type 1 [ 26.714000] Setting up standard PCI resources [ 26.723021] Linux Plug and Play Support v0.97 (c) Adam Belay [ 26.723226] PCI: Probing PCI hardware [ 26.723293] PCI: Probing PCI hardware (bus 00) [ 26.723629] Boot video device is 0000:00:01.0 [ 26.723882] PCI quirk: region 4000-407f claimed by ICH4 ACPI/GPIO/TCO [ 26.723958] PCI quirk: region 4080-40bf claimed by ICH4 GPIO [ 26.724467] PCI: Transparent bridge - 0000:00:1e.0 [ 26.725900] PCI: Using IRQ router PIIX/ICH [8086/2410] at 0000:00:1f.0 [ 26.734370] PCI: Ignore bogus resource 6 [0:0] of 0000:00:01.0 [ 26.734464] PCI: Bridge: 0000:00:1e.0 [ 26.734522] IO window: b000-cfff [ 26.734586] MEM window: d4000000-d5ffffff [ 26.734649] PREFETCH window: 10000000-100fffff [ 26.734732] PCI: Setting latency timer of device 0000:00:1e.0 to 64 [ 26.734881] NET: Registered protocol family 2 [ 26.744873] IP route cache hash table entries: 512 (order: -1, 2048 bytes) [ 26.745219] TCP established hash table entries: 2048 (order: 1, 8192 bytes) [ 26.745325] TCP bind hash table entries: 1024 (order: 0, 4096 bytes) [ 26.745408] TCP: Hash tables configured (established 2048 bind 1024) [ 26.745474] TCP reno registered [ 26.747181] NTFS driver 2.1.27 [Flags: R/O]. [ 26.747406] io scheduler noop registered [ 26.747494] io scheduler anticipatory registered (default) [ 26.747602] io scheduler deadline registered [ 26.747712] io scheduler cfq registered [ 26.748054] pci_hotplug: PCI Hot Plug PCI Core version: 0.5 [ 26.748512] isapnp: Scanning for PnP cards... [ 27.104483] isapnp: No Plug & Play device found [ 27.257337] Real Time Clock Driver v1.12ac [ 27.257416] Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled [ 27.501579] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A [ 27.745577] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A [ 27.747944] Floppy drive(s): fd0 is 1.44M [ 27.763094] FDC 0 is a post-1991 82077 [ 27.764520] RAMDISK driver initialized: 1 RAM disks of 16384K size 1024 blocksize [ 27.765100] 8139cp: 10/100 PCI Ethernet driver v1.2 (Mar 22, 2004) [ 27.765185] 8139cp 0000:01:08.0: This (id 10ec:8139 rev 10) is not an 8139C+ compatible chip [ 27.765275] 8139cp 0000:01:08.0: Try the "8139too" driver instead. [ 27.765464] 8139too Fast Ethernet driver 0.9.27 [ 27.765569] PCI: setting IRQ 10 as level-triggered [ 27.765576] PCI: Found IRQ 10 for device 0000:01:08.0 [ 27.765644] PCI: Sharing IRQ 10 with 0000:00:01.0 [ 27.766341] eth0: RealTek RTL8139 at 0xc480c000, 00:d0:c9:54:04:69, IRQ 10 [ 27.766416] eth0: Identified 8139 chip type 'RTL-8139B' [ 27.766826] PNP: No PS/2 controller found. Probing ports directly. [ 27.768741] serio: i8042 AUX port at 0x60,0x64 irq 12 [ 27.768877] serio: i8042 KBD port at 0x60,0x64 irq 1 [ 27.769480] mice: PS/2 mouse device common for all mice [ 27.769643] md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27 [ 27.769710] md: bitmap version 4.39 [ 27.769904] TCP bic registered [ 27.769984] NET: Registered protocol family 1 [ 27.770053] NET: Registered protocol family 17 [ 27.770233] Using IPI Shortcut mode [ 27.770439] Time: tsc clocksource has been installed. [ 27.771204] md: Autodetecting RAID arrays. [ 27.771272] md: autorun ... [ 27.771325] md: ... autorun DONE. [ 27.771633] RAMDISK: Compressed image found at block 0 [ 28.537528] VFS: Mounted root (ext2 filesystem). [ 28.538554] Freeing unused kernel memory: 184k freed [ 28.655953] input: AT Translated Set 2 keyboard as /class/input/input0 [ 31.374481] eth0: link up, 100Mbps, full-duplex, lpa 0x45E1 [ 31.498219] warning: process `cmd_sock_server' used the removed sysctl system call [ 31.515686] warning: process `cmd_sock_server' used the removed sysctl system call [ 31.517293] warning: process `cmd_sock_server' used the removed sysctl system call [ 141.249637] warning: process `ls' used the removed sysctl system call [ 149.029301] SCSI subsystem initialized [ 149.117317] libata version 2.00 loaded. [ 149.129655] sata_sil24 0000:01:02.0: version 0.3 [ 149.129804] PCI: setting IRQ 11 as level-triggered [ 149.129868] PCI: Found IRQ 11 for device 0000:01:02.0 [ 149.130395] ata1: SATA max UDMA/100 cmd 0xC4820000 ctl 0x0 bmdma 0x0 irq 11 [ 149.130983] ata2: SATA max UDMA/100 cmd 0xC4822000 ctl 0x0 bmdma 0x0 irq 11 [ 149.131658] ata3: SATA max UDMA/100 cmd 0xC4824000 ctl 0x0 bmdma 0x0 irq 11 [ 149.132394] ata4: SATA max UDMA/100 cmd 0xC4826000 ctl 0x0 bmdma 0x0 irq 11 [ 149.133988] scsi0 : sata_sil24 [ 149.548645] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 149.550073] ata1.00: ATA-6, max UDMA/133, 78125000 sectors: LBA48 NCQ (depth 31/32) [ 149.551611] ata1.00: configured for UDMA/100 [ 149.551695] scsi1 : sata_sil24 [ 149.854550] ata2: SATA link down (SStatus 0 SControl 300) [ 149.854649] scsi2 : sata_sil24 [ 150.157457] ata3: SATA link down (SStatus 0 SControl 300) [ 150.157543] scsi3 : sata_sil24 [ 150.460368] ata4: SATA link down (SStatus 0 SControl 300) [ 150.460730] Vendor: ATA Model: ST340014AS Rev: 8.12 [ 150.461532] Type: Direct-Access ANSI SCSI revision: 05 [ 150.475322] pata_sil680 0000:01:0d.0: version 0.3.2 [ 150.475440] sil680: BA5_EN = 1 clock = 00 [ 150.475502] sil680: BA5_EN = 1 clock = 10 [ 150.475576] sil680: 133MHz clock. [ 150.475658] PCI: setting IRQ 15 as level-triggered [ 150.475720] PCI: Found IRQ 15 for device 0000:01:0d.0 [ 150.475971] ata5: PATA max UDMA/133 cmd 0xB800 ctl 0xBC02 bmdma 0xC800 irq 15 [ 150.476140] ata6: PATA max UDMA/133 cmd 0xC000 ctl 0xC402 bmdma 0xC808 irq 15 [ 150.476238] scsi4 : pata_sil680 [ 150.628794] ata5.00: ATA-6, max UDMA/100, 586072368 sectors: LBA48 [ 150.630288] ata5.00: configured for UDMA/100 [ 150.630371] scsi5 : pata_sil680 [ 151.960469] ATA: abnormal status 0x7F on port 0xC007 [ 151.960979] Vendor: ATA Model: ST3300831A Rev: 2.04 [ 151.961770] Type: Direct-Access ANSI SCSI revision: 05 [ 151.975757] scsi 0:0:0:0: Attached scsi generic sg0 type 0 [ 151.976010] scsi 4:0:0:0: Attached scsi generic sg1 type 0 [ 166.665477] ata1: exception Emask 0x10 SAct 0x0 SErr 0x80000 action 0x2 frozen [ 166.665585] ata1: (irq_stat 0x01100010, PHY RDY changed) [ 167.906233] ata1: soft resetting port [ 167.906308] ata1: SATA link down (SStatus 0 SControl 300) [ 167.906381] ata1: failed to recover some devices, retrying in 5 secs [ 173.425609] ata1: hard resetting port [ 175.445019] ata1: SATA link down (SStatus 0 SControl 300) [ 175.445089] ata1: failed to recover some devices, retrying in 5 secs [ 180.964393] ata1: hard resetting port [ 183.589635] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 183.592578] ata1.00: configured for UDMA/100 [ 183.592654] ata1: EH complete [ 288.313094] ata5: soft resetting port [ 289.589115] ATA: abnormal status 0x7F on port 0xB807 [ 290.717469] ATA: abnormal status 0x7F on port 0xB807 [ 291.845618] ATA: abnormal status 0x7F on port 0xB807 [ 292.970969] ATA: abnormal status 0x7F on port 0xB807 [ 294.096315] ATA: abnormal status 0x7F on port 0xB807 [ 295.221660] ATA: abnormal status 0x7F on port 0xB807 [ 325.213051] ata5.00: qc timeout (cmd 0xec) [ 325.213138] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4) [ 325.213204] ata5.00: revalidation failed (errno=-5) [ 325.213269] ata5: failed to recover some devices, retrying in 5 secs [ 330.212534] ata5: soft resetting port [ 331.488778] ATA: abnormal status 0x7F on port 0xB807 [ 332.617033] ATA: abnormal status 0x7F on port 0xB807 [ 333.745177] ATA: abnormal status 0x7F on port 0xB807 [ 334.870528] ATA: abnormal status 0x7F on port 0xB807 [ 335.995875] ATA: abnormal status 0x7F on port 0xB807 [ 337.121220] ATA: abnormal status 0x7F on port 0xB807 [ 367.112745] ata5.00: qc timeout (cmd 0xec) [ 367.112832] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4) [ 367.112898] ata5.00: revalidation failed (errno=-5) [ 367.112963] ata5: failed to recover some devices, retrying in 5 secs [ 372.112235] ata5: soft resetting port [ 373.388459] ATA: abnormal status 0x7F on port 0xB807 [ 374.516705] ATA: abnormal status 0x7F on port 0xB807 [ 375.644854] ATA: abnormal status 0x7F on port 0xB807 [ 376.770204] ATA: abnormal status 0x7F on port 0xB807 [ 377.895551] ATA: abnormal status 0x7F on port 0xB807 [ 379.020896] ATA: abnormal status 0x7F on port 0xB807 [ 409.012409] ata5.00: qc timeout (cmd 0xec) [ 409.012488] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4) [ 409.012554] ata5.00: revalidation failed (errno=-5) [ 409.012618] ata5.00: disabled [ 409.513207] ata5: soft resetting port [ 410.789460] ATA: abnormal status 0x7F on port 0xB807 [ 410.789626] ata5: EH complete [ 410.789696] BUG: warning at lib/kref.c:32/kref_get() [ 410.789764] [<c0234c14>] kref_get+0x44/0x50 [ 410.789868] [<c02342ca>] kobject_get+0x1a/0x30 [ 410.789953] [<c0278946>] get_device+0x16/0x30 [ 410.790057] [<c4838c8e>] scsi_device_get+0x3e/0x80 [scsi_mod] [ 410.790224] [<c4864341>] ata_scsi_remove_dev+0x31/0xd0 [libata] [ 410.790391] [<c486441e>] ata_scsi_hotplug+0x3e/0xc0 [libata] [ 410.790499] [<c0121c4f>] run_workqueue+0x6f/0xd0 [ 410.790596] [<c48643e0>] ata_scsi_hotplug+0x0/0xc0 [libata] [ 410.790704] [<c0121e18>] worker_thread+0x168/0x190 [ 410.790791] [<c010e210>] default_wake_function+0x0/0x20 [ 410.790890] [<c010e210>] default_wake_function+0x0/0x20 [ 410.790979] [<c0121cb0>] worker_thread+0x0/0x190 [ 410.791064] [<c0125387>] kthread+0xb7/0xc0 [ 410.791157] [<c01252d0>] kthread+0x0/0xc0 [ 410.791241] [<c01031ab>] kernel_thread_helper+0x7/0x1c [ 410.791335] ======================= [ 410.791404] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000058 [ 410.791517] printing eip: [ 410.791570] c4838c9e [ 410.791620] *pde = 00000000 [ 410.791676] Oops: 0000 [#1] [ 410.791729] 8K_STACKS [ 410.791804] last sysfs file: [ 410.791857] Modules linked in: sg pata_sil680 sata_sil24 libata scsi_mod [ 410.792071] CPU: 0 [ 410.792073] EIP: 0060:[<c4838c9e>] Not tainted VLI [ 410.792077] EFLAGS: 00010086 (2.6.18-rc4-mm2 #1) [ 410.792261] EIP is at scsi_device_get+0x4e/0x80 [scsi_mod] [ 410.792325] eax: 00000000 ebx: c2848400 ecx: 000003d4 edx: 00000001 [ 410.792393] esi: c2848490 edi: c2864448 ebp: c28642c0 esp: c2809ee0 [ 410.792459] ds: 007b es: 007b ss: 0068 [ 410.792519] Process ata_aux (pid: 740, ti=c2808000 task=c3edfa70 task.ti=c2808000) [ 410.792586] Stack: c2848490 c2848400 00000282 c4864341 c2848400 c3a5bd40 00000000 c390dfa4 [ 410.792887] c2864448 00000001 c28642c0 c28642c0 c486441e c2864448 c28662ac 00000296 [ 410.793189] c39a02a0 c0121c4f c28642c0 000001e8 bee2c8c6 c39a02b8 c39a02a8 c48643e0 [ 410.793490] Call Trace: [ 410.793586] [<c4864341>] ata_scsi_remove_dev+0x31/0xd0 [libata] [ 410.793699] [<c486441e>] ata_scsi_hotplug+0x3e/0xc0 [libata] [ 410.793807] [<c0121c4f>] run_workqueue+0x6f/0xd0 [ 410.793893] [<c48643e0>] ata_scsi_hotplug+0x0/0xc0 [libata] [ 410.794001] [<c0121e18>] worker_thread+0x168/0x190 [ 410.794088] [<c010e210>] default_wake_function+0x0/0x20 [ 410.794177] [<c010e210>] default_wake_function+0x0/0x20 [ 410.794266] [<c0121cb0>] worker_thread+0x0/0x190 [ 410.794351] [<c0125387>] kthread+0xb7/0xc0 [ 410.794436] [<c01252d0>] kthread+0x0/0xc0 [ 410.794520] [<c01031ab>] kernel_thread_helper+0x7/0x1c [ 410.794607] ======================= [ 410.794662] Code: 5c 24 04 89 d0 8b 74 24 08 83 c4 0c c3 8d b3 90 00 00 00 89 34 24 e8 a2 fc a3 fb ba fa ff ff ff 85 c0 74 db 8b 03 ba 01 00 00 00 <8b> 40 58 8b 00 85 c0 74 0b 83 38 02 74 1d ff 80 20 01 00 00 85 [ 410.796319] EIP: [<c4838c9e>] scsi_device_get+0x4e/0x80 [scsi_mod] SS:ESP 0068:c2809ee0 [ 410.796476] sh-2.04# exit exit Connection closed by foreign host. ]0;fchen@RockySnow:~/Projects/gemstar/debug[fchen@RockySnow debug]$ exit exit Script done on Thu 24 Aug 2006 05:22:56 PM MDT