On Wed, Jan 13, 2016 at 08:10:31AM -0800, James Bottomley wrote: > The starting node for a klist iteration is often passed in from > somewhere way above the klist infrastructure, meaning there's no > guarantee the node is still on the list. We've seen this in SCSI where > we use bus_find_device() to iterate through a list of devices. In the > face of heavy hotplug activity, the last device returned by > bus_find_device() can be removed before the next call. This leads to > > Dec 3 13:22:02 localhost kernel: WARNING: CPU: 2 PID: 28073 at include/linux/kref.h:47 klist_iter_init_node+0x3d/0x50() > Dec 3 13:22:02 localhost kernel: Modules linked in: scsi_debug x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel joydev iTCO_wdt dcdbas ipmi_devintf acpi_power_meter iTCO_vendor_support ipmi_si imsghandler pcspkr wmi acpi_cpufreq tpm_tis tpm shpchp lpc_ich mfd_core nfsd nfs_acl lockd grace sunrpc tg3 ptp pps_core > Dec 3 13:22:02 localhost kernel: CPU: 2 PID: 28073 Comm: cat Not tainted 4.4.0-rc1+ #2 > Dec 3 13:22:02 localhost kernel: Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.0.22 11/19/2013 > Dec 3 13:22:02 localhost kernel: ffffffff81a20e77 ffff880613acfd18 ffffffff81321eef 0000000000000000 > Dec 3 13:22:02 localhost kernel: ffff880613acfd50 ffffffff8107ca52 ffff88061176b198 0000000000000000 > Dec 3 13:22:02 localhost kernel: ffffffff814542b0 ffff880610cfb100 ffff88061176b198 ffff880613acfd60 > Dec 3 13:22:02 localhost kernel: Call Trace: > Dec 3 13:22:02 localhost kernel: [<ffffffff81321eef>] dump_stack+0x44/0x55 > Dec 3 13:22:02 localhost kernel: [<ffffffff8107ca52>] warn_slowpath_common+0x82/0xc0 > Dec 3 13:22:02 localhost kernel: [<ffffffff814542b0>] ? proc_scsi_show+0x20/0x20 > Dec 3 13:22:02 localhost kernel: [<ffffffff8107cb4a>] warn_slowpath_null+0x1a/0x20 > Dec 3 13:22:02 localhost kernel: [<ffffffff8167225d>] klist_iter_init_node+0x3d/0x50 > Dec 3 13:22:02 localhost kernel: [<ffffffff81421d41>] bus_find_device+0x51/0xb0 > Dec 3 13:22:02 localhost kernel: [<ffffffff814545ad>] scsi_seq_next+0x2d/0x40 > [...] > > And an eventual crash. It can actually occur in any hotplug system > which has a device finder and a starting device. > > We can fix this globally by making sure the starting node for > klist_iter_init_node() is actually a member of the list before using it > (and by starting from the beginning if it isn't). > > Reported-by: Ewan D. Milne <emilne@xxxxxxxxxx> > Tested-by: Ewan D. Milne <emilne@xxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> > > --- > > diff --git a/lib/klist.c b/lib/klist.c > index d74cf7a..0507fa5 100644 > --- a/lib/klist.c > +++ b/lib/klist.c > @@ -282,9 +282,9 @@ void klist_iter_init_node(struct klist *k, struct klist_iter *i, > struct klist_node *n) > { > i->i_klist = k; > - i->i_cur = n; > - if (n) > - kref_get(&n->n_ref); > + i->i_cur = NULL; > + if (n && kref_get_unless_zero(&n->n_ref)) > + i->i_cur = n; > } > EXPORT_SYMBOL_GPL(klist_iter_init_node); Thanks for this, looks good, I'll queue it up after 4.5-rc1 is out. greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html