James Bottomley wrote:
On Tue, 2008-07-15 at 14:25 -0600, Matthew Wilcox wrote:
Do we need to worry about a host in the SHOST_DEL state? In that case, it will still
exist to some degree, but scsi_host_get will fail. For example, what happens if a
shell is in /sys/class/scsi_host/host5/ and you delete host 5 and try to add another.
Couldn't you run into the same problem? In that case the scsi_host_get will fail.
I suppose you could check specifically for -ENXIO getting returned...
Or we could make the host_no a u64 and avoid the problem ever happening
in our lifetimes. I'm amazed that anyone's had the time to do 4 billion
add/removes, to be honest. Assuming it takes 1 second per add/remove
cycle, and there's not even time to scan a bus in that time, that's
still 136 years.
Actually, right at the moment, a lot of the udev stuff is conditioned on
a non repeating host number (which is why we don't use idr like we do
for the other things). I'm really reluctant to go to a u64 host
number ... what was the use case that produced this problem?
James
All of it started in some functional tests against pata_pdc2027x module
which includes some rmmod/modprobe (around 10000). Before I start to
work on it, this functional test started to fail, sometimes with at
different points.
Just to make clear, I am adding some kernel messages and mon info to
help some additional comments.
We can see that on the first and Third times (on rmmod) a panic happened
far to the short int border (around 19741 and 9887). On the Second we
can see that it happens on modprobe when going from 65355 to 0. This
pointed me to the patch I summited and which I used to check if all of
it would be "fixed". After that patch (I know it is far away from a good
patch) I got this rmmod/modprobe loop running for more then 4 days with
no kernel panic. It made me believe that somehow it avoids the First and
Third panics to happen.
I am pretty knew to this peaces of code and I probably don't have a full
overview of it. That is way I would like to have your input and opinions.
I really appreciate that.
Daniel Debonzi
First occurrence:
**********************************************
Vendor: IBM Model: DROM00205 Rev: NR36
Type: CD-ROM ANSI SCSI revision: 02
sr0: scsi3-mmc drive: 61x/61x cd/rw xa/form2 cdda tray
sr 19740:0:0:0: Attached scsi generic sg4 type 5
ata19739.00: disabled
Vendor: IBM Model: DROM00205 Rev: NR36
Type: CD-ROM ANSI SCSI revision: 02
sr0: scsi3-mmc drive: 61x/61x cd/rw xa/form2 cdda tray
sr 19742:0:0:0: Attached scsi generic sg4 type 5
ata19741.00: disabled
Unable to handle kernel paging request for data at address
0xd0000000008c3e98
Faulting instruction address: 0xd0000000001db250
cpu 0x3: Vector: 300 (Data Access) at [c0000000391173c0]
pc: d0000000001db250: .scsi_target_reap_usercontext+0x90/0x114
[scsi_mod]
lr: d0000000001db244: .scsi_target_reap_usercontext+0x84/0x114
[scsi_mod]
sp: c000000039117640
msr: 8000000000001032
dar: d0000000008c3e98
dsisr: 40000000
current = 0xc00000007182dc60
paca = 0xc000000000475400
pid = 535, comm = udevd
3:mon> t
[c0000000391176e0] c00000000007f35c .execute_in_process_context+0x54/0xa0
[c000000039117760] d0000000001da190 .scsi_target_reap+0xc8/0x100 [scsi_mod]
[c0000000391177f0] d0000000001db5c8
.scsi_device_dev_release_usercontext+0xc8/0x
120 [scsi_mod]
[c0000000391178a0] c00000000007f35c .execute_in_process_context+0x54/0xa0
[c000000039117920] d0000000001db4e8 .scsi_device_dev_release+0x24/0x3c
[scsi_mod
]
[c0000000391179a0] c00000000022580c .device_release+0x4c/0x78
[c000000039117a20] c0000000001b2388 .kobject_cleanup+0x90/0xf0
[c000000039117ac0] c0000000001b3470 .kref_put+0x84/0xa0
[c000000039117b40] c0000000001b22e0 .kobject_put+0x28/0x40
[c000000039117bc0] c00000000014dbe8 .sysfs_release+0x48/0xe0
[c000000039117c50] c0000000000eca1c .__fput+0x108/0x25c
[c000000039117d00] c0000000000e8fa4 .filp_close+0xac/0xd4
[c000000039117d90] c0000000000eacf4 .sys_close+0xc4/0x110
[c000000039117e30] c0000000000086a4 syscall_exit+0x0/0x40
--- Exception: c00 (System Call) at 0000000007e7d7d0
SP (ff8dc360) is in userspace
**********************************************
Second occurrence:
**********************************************
* Vendor: IBM Model: DROM00205 Rev: NR38
Type: CD-ROM ANSI SCSI revision: 02
sr0: scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda tray
sr 65529:0:0:0: Attached scsi generic sg11 type 5
ata65529.00: disabled
Vendor: IBM Model: DROM00205 Rev: NR38
Type: CD-ROM ANSI SCSI revision: 02
sr0: scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda tray
sr 65531:0:0:0: Attached scsi generic sg11 type 5
ata65531.00: disabled
Vendor: IBM Model: DROM00205 Rev: NR38
Type: CD-ROM ANSI SCSI revision: 02
sr0: scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda tray
sr 65533:0:0:0: Attached scsi generic sg11 type 5
ata65533.00: disabled
kobject_add failed for host0 with -EEXIST, don't try to register things
with the
same name in the same directory.
Call Trace:
[C0000000B565B1A0] [C00000000000FFDC] .show_stack+0x68/0x1b0 (unreliable)
[C0000000B565B240] [C0000000001B28E0] .kobject_add+0x1a4/0x1fc
[C0000000B565B2E0] [C000000000229EE0] .class_device_add+0xb4/0x4e4
[C0000000B565B3B0] [D0000000001D1F2C] .scsi_add_host+0xf8/0x208 [scsi_mod]
[C0000000B565B450] [D00000000052B52C] .ata_scsi_add_hosts+0xa4/0x160
[libata]
[C0000000B565B500] [D000000000527C0C] .ata_host_register+0xec/0x368 [libata]
[C0000000B565B5D0] [D000000000527F1C] .ata_host_activate+0x94/0xe0 [libata]
[C0000000B565B680] [D0000000007D11B0] .pdc2027x_init_one+0x36c/0x39c
[pata_pdc2027x]
[C0000000B565B730] [C0000000001C3530] .pci_device_probe+0x13c/0x1dc
[C0000000B565B7F0] [C0000000002287F0] .driver_probe_device+0xa0/0x16c
[C0000000B565B890] [C000000000228A58] .__driver_attach+0xb4/0x138
[C0000000B565B920] [C000000000227F14] .bus_for_each_dev+0x7c/0xd4
[C0000000B565B9E0] [C000000000228694] .driver_attach+0x28/0x40
[C0000000B565BA60] [C00000000022797C] .bus_add_driver+0x98/0x18c
[C0000000B565BB00] [C000000000228E58] .driver_register+0xa8/0xc4
[C0000000B565BB80] [C0000000001C3838] .__pci_register_driver+0x5c/0xa4
[C0000000B565BC10] [D0000000007D14D4] .pdc2027x_init+0x20/0x45c
[pata_pdc2027x]
[C0000000B565BC90] [C000000000090B50] .sys_init_module+0x1764/0x1998
[C0000000B565BE30] [C0000000000086A4] syscall_exit+0x0/0x40
slab error in kmem_cache_destroy(): cache `scsi_cmd_cache': Can't free
all objects
Call Trace:
[C0000000B565B070] [C00000000000FFDC] .show_stack+0x68/0x1b0 (unreliable)
[C0000000B565B110] [C0000000000E4020] .kmem_cache_destroy+0x94/0x1b0
[C0000000B565B1A0] [D0000000001D12D8]
.scsi_destroy_command_freelist+0xa0/0xcc
[scsi_mod]
[C0000000B565B230] [D0000000001D1720] .scsi_host_dev_release+0x80/0xe0
[scsi_mod]
[C0000000B565B2C0] [C00000000022580C] .device_release+0x4c/0x78
[C0000000B565B340] [C0000000001B2388] .kobject_cleanup+0x90/0xf0
[C0000000B565B3E0] [C0000000001B3470] .kref_put+0x84/0xa0
[C0000000B565B460] [C0000000001B22E0] .kobject_put+0x28/0x40
[C0000000B565B4E0] [C000000000225968] .put_device+0x1c/0x30
[C0000000B565B560] [D0000000001D168C] .scsi_host_put+0x14/0x28 [scsi_mod]
[C0000000B565B5E0] [D000000000528058] .ata_host_release+0xf0/0x14c [libata]
[C0000000B565B680] [C00000000022C720] .release_nodes+0x1c8/0x22c
[C0000000B565B750] [C00000000022CB98] .devres_release_all+0x58/0xd4
[C0000000B565B7F0] [C000000000228860] .driver_probe_device+0x110/0x16c
[C0000000B565B890] [C000000000228A58] .__driver_attach+0xb4/0x138
[C0000000B565B920] [C000000000227F14] .bus_for_each_dev+0x7c/0xd4
[C0000000B565B9E0] [C000000000228694] .driver_attach+0x28/0x40
[C0000000B565BA60] [C00000000022797C] .bus_add_driver+0x98/0x18c
[C0000000B565BB00] [C000000000228E58] .driver_register+0xa8/0xc4
[C0000000B565BB80] [C0000000001C3838] .__pci_register_driver+0x5c/0xa4
[C0000000B565BC10] [D0000000007D14D4] .pdc2027x_init+0x20/0x45c
[pata_pdc2027x]
[C0000000B565BC90] [C000000000090B50] .sys_init_module+0x1764/0x1998
[C0000000B565BE30] [C0000000000086A4] syscall_exit+0x0/0x40
Unable to handle kernel paging request for data at address
0x3a30322e332f3040
Faulting instruction address: 0xc0000000000843e4
cpu 0x4: Vector: 300 (Data Access) at [c0000000b565af10]
pc: c0000000000843e4: .kthread_stop+0x3c/0xfc
lr: c0000000000843e0: .kthread_stop+0x38/0xfc
sp: c0000000b565b190
msr: 8000000000009032
dar: 3a30322e332f3040
dsisr: 40000000
current = 0xc0000000ea1294d0
paca = 0xc000000000475600
pid = 15221, comm = modprobe
*********************************************
Third occurrence:
**********************************************
<7>pata_pdc2027x 0001:cc:01.0: version 0.74-ac5
<6>pata_pdc2027x 0001:cc:01.0: PLL input clock 32758 kHz
<6>ata9887: PATA max UDMA/133 cmd 0xD000080084DC07C0 ctl 0xD000080084DC0FDA
bmdma 0xD000080084DC0000 irq 166
<6>ata9888: PATA max UDMA/133 cmd 0xD000080084DC05C0 ctl 0xD000080084DC0DDA
bmdma 0xD000080084DC0008 irq 166
<6>scsi9887 : pata_pdc2027x
<6>ata9887.00: ATAPI, max UDMA/33
<6>ata9887.00: configured for UDMA/33
<6>scsi9888 : pata_pdc2027x
<4>ATA: abnormal status 0x8 on port 0xD000080084DC05DF
<5> Vendor: IBM Model: DROM00205 Rev: NR36
<5> Type: CD-ROM ANSI SCSI revision: 02
<4>sr0: scsi3-mmc drive: 61x/61x cd/rw xa/form2 cdda tray
<7>sr 9887:0:0:0: Attached scsi CD-ROM sr0
<5>sr 9887:0:0:0: Attached scsi generic sg0 type 5
<4>ata9887.00: disabled
<1>Unable to handle kernel paging request for data at address
0xd000000000047878
<1>Faulting instruction address: 0xd0000000000821f0
cpu 0x1: Vector: 300 (Data Access) at [c000000070f03580]
pc: d0000000000821f0:
.scsi_device_dev_release_usercontext+0x40/0x1ac [scsi_mod]
lr: c000000000077394: .execute_in_process_context+0x54/0xa0
sp: c000000070f03800
msr: 8000000000009032
dar: d000000000047878
dsisr: 40000000
current = 0xc000000002cacad0
paca = 0xc0000000004a3780
pid = 2086, comm = hald
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html