Re: scsi_host_alloc does not check for used shost->host_no

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



James Bottomley wrote:
On Tue, 2008-07-15 at 14:25 -0600, Matthew Wilcox wrote:
Do we need to worry about a host in the SHOST_DEL state? In that case, it will still
exist to some degree, but scsi_host_get will fail. For example, what happens if a
shell is in /sys/class/scsi_host/host5/ and you delete host 5 and try to add another.
Couldn't you run into the same problem? In that case the scsi_host_get will fail.
I suppose you could check specifically for -ENXIO getting returned...
Or we could make the host_no a u64 and avoid the problem ever happening
in our lifetimes.  I'm amazed that anyone's had the time to do 4 billion
add/removes, to be honest.  Assuming it takes 1 second per add/remove
cycle, and there's not even time to scan a bus in that time, that's
still 136 years.

Actually, right at the moment, a lot of the udev stuff is conditioned on
a non repeating host number (which is why we don't use idr like we do
for the other things).  I'm really reluctant to go to a u64 host
number ... what was the use case that produced this problem?

James

All of it started in some functional tests against pata_pdc2027x module which includes some rmmod/modprobe (around 10000). Before I start to work on it, this functional test started to fail, sometimes with at different points.

Just to make clear, I am adding some kernel messages and mon info to help some additional comments.

We can see that on the first and Third times (on rmmod) a panic happened far to the short int border (around 19741 and 9887). On the Second we can see that it happens on modprobe when going from 65355 to 0. This pointed me to the patch I summited and which I used to check if all of it would be "fixed". After that patch (I know it is far away from a good patch) I got this rmmod/modprobe loop running for more then 4 days with no kernel panic. It made me believe that somehow it avoids the First and Third panics to happen.

I am pretty knew to this peaces of code and I probably don't have a full overview of it. That is way I would like to have your input and opinions.

I really appreciate that.
Daniel Debonzi

First occurrence:
**********************************************
   Vendor: IBM       Model: DROM00205         Rev: NR36
  Type:   CD-ROM                             ANSI SCSI revision: 02
sr0: scsi3-mmc drive: 61x/61x cd/rw xa/form2 cdda tray
sr 19740:0:0:0: Attached scsi generic sg4 type 5
ata19739.00: disabled
  Vendor: IBM       Model: DROM00205         Rev: NR36
  Type:   CD-ROM                             ANSI SCSI revision: 02
sr0: scsi3-mmc drive: 61x/61x cd/rw xa/form2 cdda tray
sr 19742:0:0:0: Attached scsi generic sg4 type 5
ata19741.00: disabled
Unable to handle kernel paging request for data at address 0xd0000000008c3e98
Faulting instruction address: 0xd0000000001db250
cpu 0x3: Vector: 300 (Data Access) at [c0000000391173c0]
pc: d0000000001db250: .scsi_target_reap_usercontext+0x90/0x114 [scsi_mod] lr: d0000000001db244: .scsi_target_reap_usercontext+0x84/0x114 [scsi_mod]
    sp: c000000039117640
   msr: 8000000000001032
   dar: d0000000008c3e98
 dsisr: 40000000
  current = 0xc00000007182dc60
  paca    = 0xc000000000475400
    pid   = 535, comm = udevd


3:mon> t
[c0000000391176e0] c00000000007f35c .execute_in_process_context+0x54/0xa0
[c000000039117760] d0000000001da190 .scsi_target_reap+0xc8/0x100 [scsi_mod]
[c0000000391177f0] d0000000001db5c8 .scsi_device_dev_release_usercontext+0xc8/0x
120 [scsi_mod]
[c0000000391178a0] c00000000007f35c .execute_in_process_context+0x54/0xa0
[c000000039117920] d0000000001db4e8 .scsi_device_dev_release+0x24/0x3c [scsi_mod
]
[c0000000391179a0] c00000000022580c .device_release+0x4c/0x78
[c000000039117a20] c0000000001b2388 .kobject_cleanup+0x90/0xf0
[c000000039117ac0] c0000000001b3470 .kref_put+0x84/0xa0
[c000000039117b40] c0000000001b22e0 .kobject_put+0x28/0x40
[c000000039117bc0] c00000000014dbe8 .sysfs_release+0x48/0xe0
[c000000039117c50] c0000000000eca1c .__fput+0x108/0x25c
[c000000039117d00] c0000000000e8fa4 .filp_close+0xac/0xd4
[c000000039117d90] c0000000000eacf4 .sys_close+0xc4/0x110
[c000000039117e30] c0000000000086a4 syscall_exit+0x0/0x40
--- Exception: c00 (System Call) at 0000000007e7d7d0
SP (ff8dc360) is in userspace

**********************************************

Second occurrence:
**********************************************
*  Vendor: IBM       Model: DROM00205         Rev: NR38
  Type:   CD-ROM                             ANSI SCSI revision: 02
sr0: scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda tray
sr 65529:0:0:0: Attached scsi generic sg11 type 5
ata65529.00: disabled
  Vendor: IBM       Model: DROM00205         Rev: NR38
  Type:   CD-ROM                             ANSI SCSI revision: 02
sr0: scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda tray
sr 65531:0:0:0: Attached scsi generic sg11 type 5
ata65531.00: disabled
  Vendor: IBM       Model: DROM00205         Rev: NR38
  Type:   CD-ROM                             ANSI SCSI revision: 02
sr0: scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda tray
sr 65533:0:0:0: Attached scsi generic sg11 type 5
ata65533.00: disabled
kobject_add failed for host0 with -EEXIST, don't try to register things with the
same name in the same directory.
Call Trace:
[C0000000B565B1A0] [C00000000000FFDC] .show_stack+0x68/0x1b0 (unreliable)
[C0000000B565B240] [C0000000001B28E0] .kobject_add+0x1a4/0x1fc
[C0000000B565B2E0] [C000000000229EE0] .class_device_add+0xb4/0x4e4
[C0000000B565B3B0] [D0000000001D1F2C] .scsi_add_host+0xf8/0x208 [scsi_mod]
[C0000000B565B450] [D00000000052B52C] .ata_scsi_add_hosts+0xa4/0x160 [libata]
[C0000000B565B500] [D000000000527C0C] .ata_host_register+0xec/0x368 [libata]
[C0000000B565B5D0] [D000000000527F1C] .ata_host_activate+0x94/0xe0 [libata]
[C0000000B565B680] [D0000000007D11B0] .pdc2027x_init_one+0x36c/0x39c [pata_pdc2027x]
[C0000000B565B730] [C0000000001C3530] .pci_device_probe+0x13c/0x1dc
[C0000000B565B7F0] [C0000000002287F0] .driver_probe_device+0xa0/0x16c
[C0000000B565B890] [C000000000228A58] .__driver_attach+0xb4/0x138
[C0000000B565B920] [C000000000227F14] .bus_for_each_dev+0x7c/0xd4
[C0000000B565B9E0] [C000000000228694] .driver_attach+0x28/0x40
[C0000000B565BA60] [C00000000022797C] .bus_add_driver+0x98/0x18c
[C0000000B565BB00] [C000000000228E58] .driver_register+0xa8/0xc4
[C0000000B565BB80] [C0000000001C3838] .__pci_register_driver+0x5c/0xa4
[C0000000B565BC10] [D0000000007D14D4] .pdc2027x_init+0x20/0x45c [pata_pdc2027x]
[C0000000B565BC90] [C000000000090B50] .sys_init_module+0x1764/0x1998
[C0000000B565BE30] [C0000000000086A4] syscall_exit+0x0/0x40
slab error in kmem_cache_destroy(): cache `scsi_cmd_cache': Can't free all objects
Call Trace:
[C0000000B565B070] [C00000000000FFDC] .show_stack+0x68/0x1b0 (unreliable)
[C0000000B565B110] [C0000000000E4020] .kmem_cache_destroy+0x94/0x1b0
[C0000000B565B1A0] [D0000000001D12D8] .scsi_destroy_command_freelist+0xa0/0xcc
[scsi_mod]
[C0000000B565B230] [D0000000001D1720] .scsi_host_dev_release+0x80/0xe0 [scsi_mod]
[C0000000B565B2C0] [C00000000022580C] .device_release+0x4c/0x78
[C0000000B565B340] [C0000000001B2388] .kobject_cleanup+0x90/0xf0
[C0000000B565B3E0] [C0000000001B3470] .kref_put+0x84/0xa0
[C0000000B565B460] [C0000000001B22E0] .kobject_put+0x28/0x40
[C0000000B565B4E0] [C000000000225968] .put_device+0x1c/0x30
[C0000000B565B560] [D0000000001D168C] .scsi_host_put+0x14/0x28 [scsi_mod]
[C0000000B565B5E0] [D000000000528058] .ata_host_release+0xf0/0x14c [libata]
[C0000000B565B680] [C00000000022C720] .release_nodes+0x1c8/0x22c
[C0000000B565B750] [C00000000022CB98] .devres_release_all+0x58/0xd4
[C0000000B565B7F0] [C000000000228860] .driver_probe_device+0x110/0x16c
[C0000000B565B890] [C000000000228A58] .__driver_attach+0xb4/0x138
[C0000000B565B920] [C000000000227F14] .bus_for_each_dev+0x7c/0xd4
[C0000000B565B9E0] [C000000000228694] .driver_attach+0x28/0x40
[C0000000B565BA60] [C00000000022797C] .bus_add_driver+0x98/0x18c
[C0000000B565BB00] [C000000000228E58] .driver_register+0xa8/0xc4
[C0000000B565BB80] [C0000000001C3838] .__pci_register_driver+0x5c/0xa4
[C0000000B565BC10] [D0000000007D14D4] .pdc2027x_init+0x20/0x45c [pata_pdc2027x]
[C0000000B565BC90] [C000000000090B50] .sys_init_module+0x1764/0x1998
[C0000000B565BE30] [C0000000000086A4] syscall_exit+0x0/0x40
Unable to handle kernel paging request for data at address 0x3a30322e332f3040
Faulting instruction address: 0xc0000000000843e4
cpu 0x4: Vector: 300 (Data Access) at [c0000000b565af10]
    pc: c0000000000843e4: .kthread_stop+0x3c/0xfc
    lr: c0000000000843e0: .kthread_stop+0x38/0xfc
    sp: c0000000b565b190
   msr: 8000000000009032
   dar: 3a30322e332f3040
 dsisr: 40000000
  current = 0xc0000000ea1294d0
  paca    = 0xc000000000475600
    pid   = 15221, comm = modprobe
*********************************************



Third occurrence:
**********************************************
 <7>pata_pdc2027x 0001:cc:01.0: version 0.74-ac5
<6>pata_pdc2027x 0001:cc:01.0: PLL input clock 32758 kHz
<6>ata9887: PATA max UDMA/133 cmd 0xD000080084DC07C0 ctl 0xD000080084DC0FDA
bmdma 0xD000080084DC0000 irq 166
<6>ata9888: PATA max UDMA/133 cmd 0xD000080084DC05C0 ctl 0xD000080084DC0DDA
bmdma 0xD000080084DC0008 irq 166
<6>scsi9887 : pata_pdc2027x
<6>ata9887.00: ATAPI, max UDMA/33
<6>ata9887.00: configured for UDMA/33
<6>scsi9888 : pata_pdc2027x
<4>ATA: abnormal status 0x8 on port 0xD000080084DC05DF
<5>  Vendor: IBM       Model: DROM00205         Rev: NR36
<5>  Type:   CD-ROM                             ANSI SCSI revision: 02
<4>sr0: scsi3-mmc drive: 61x/61x cd/rw xa/form2 cdda tray
<7>sr 9887:0:0:0: Attached scsi CD-ROM sr0
<5>sr 9887:0:0:0: Attached scsi generic sg0 type 5
<4>ata9887.00: disabled
<1>Unable to handle kernel paging request for data at address 0xd000000000047878
<1>Faulting instruction address: 0xd0000000000821f0
cpu 0x1: Vector: 300 (Data Access) at [c000000070f03580]
pc: d0000000000821f0: .scsi_device_dev_release_usercontext+0x40/0x1ac [scsi_mod]
    lr: c000000000077394: .execute_in_process_context+0x54/0xa0
    sp: c000000070f03800
   msr: 8000000000009032
   dar: d000000000047878
 dsisr: 40000000
  current = 0xc000000002cacad0
  paca    = 0xc0000000004a3780
    pid   = 2086, comm = hald

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux