Olaf Hering wrote: > On Mon, Feb 20, Brian King wrote: > >> Olaf Hering wrote: >>> 1:mon> d c0000000024cacc8 >>> c0000000024cacc8 00000000dead4ead ffffffff00000000 |......N.........| >>> c0000000024cacd8 ffffffffffffffff c0000000024cace0 |.............L..| >>> c0000000024cace8 c0000000024cace0 c000000000614f68 |.....L.......aOh| >>> c0000000024cacf8 c000000000614f38 0000000000000000 |.....aO8........| >>> c0000000024cad08 0000000000000000 0000000000000000 |................| >>> c0000000024cad18 0000000000000000 0000000000000000 |................| >>> c0000000024cad28 0000000000000000 0000000000000000 |................| >>> c0000000024cad38 0000000000000000 0000000000000000 |................| >>> c0000000024cad48 0000000000000000 0000000000000000 |................| >>> c0000000024cad58 0000000000000000 0000000000000000 |................| >>> c0000000024cad68 0000000000000000 0000000000000000 |................| >>> c0000000024cad78 0000000000000000 0000000000000000 |................| >>> c0000000024cad88 0000000000000000 0000000000000000 |................| >>> c0000000024cad98 0000000000000000 0000000000000000 |................| >>> c0000000024cada8 0000000000000000 0000000000000000 |................| >>> c0000000024cadb8 0000000000000000 0000000000000000 |................| >>> c0000000024cadc8 0000000000000000 0000000000000000 |................| >>> c0000000024cadd8 0000000000000000 0000000000000000 |................| >> I've now seen a couple recreates of this problem on various systems in >> our labs, and there are always a bunch of zeroes in the struct device >> in the same place as above. I wonder if perhaps the call to device_add >> is failing in scsi_alloc_target. Failure of this call is not being handled >> today. Can you give the attached patch a try? > > This fixes it, tested with plain rc3. Lots of -EEXIST, I wonder if the real bug is elsewhere. I would guess that the -EEXIST is coming from: create_dir sysfs_create_dir create_dir kobject_add device_add Looking at the scsi_target reap code, it looks like there is a race condition. The target is removed from the hosts list of targets under the host lock, then the host lock is released. If another thread tries to add the same target that is being tore down at this point (before device_del), the device_add will fail with EEXIST since the sysfs directory for the device still exists. Any reason we can't protect the target reaping code from this by grabbing the scan_mutex? Brian -- Brian King eServer Storage I/O IBM Linux Technology Center - : send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html