On 11/22/2011 09:59 AM, Hannes Reinecke wrote: > On 11/21/2011 06:32 PM, Petr Tesarik wrote: >> Hi folks, >> >> I've been working on a kernel crash dump of an ancient kernel recently, and I >> have come to the conculsion that walking the scsi devices via >> bus_find_device() is completely flawed. While looking for an upstream fix, I >> didn't find any, so the same flaw is probably still there. However, let me ask >> here to check how this is supposed to work. >> >> First, this is how I understand the issue. The "/proc/scsi/scsi" file is >> handled as a pretty standard seqfile, iterating over the devices with the >> following function: >> >> static inline struct device *next_scsi_device(struct device *start) >> { >> struct device *next = bus_find_device(&scsi_bus_type, start, NULL, >> always_match); >> put_device(start); >> return next; >> } >> >> The returned value is used for the next iteration. Now, bus_find_device() >> assumes that the device is still attached to the knode_bus klist, because >> that's how it initializes the klist iterator. When it finds the next device, >> it increments the reference count on the device with get_device(), but it >> doesn't do anything about the knode_bus field. So, when somebody calls >> scsi_remove_device() on the current device between two calls to >> next_scsi_device, then it does: >> >> if (sdev->is_visible) { >> [...] >> device_del(dev); >> >> which in turn calls: >> >> bus_remove_device(dev); >> >> which does: >> >> if (klist_node_attached(&dev->p->knode_bus)) >> klist_del(&dev->p->knode_bus); >> >> So, even though the struct device has a non-zero refcount, the code in >> next_scsi_device cannot continue, because it only has a stale pointer to an >> already detached klist, right? >> >> At least that's what I saw in 2.6.16, and I can still see the same thing >> possible in 3.1. >> > Hmm. Looks like we need to increase the refcount to knode_bus when > we iterate devices. > Let me check ... > No, this seems to be okay. klists are protected by their own refcounting in ->n_ref (via klist_dec_and_del()), so the list processing can continue. However, seeing that you're working with 2.6.16 there has been a rather serious issue with scsi device scanning, which has been fixed by 32aeef605aa01e1fee45e052eceffb00e72ba2b0. Please to check whether that patch is included. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxx +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html