On Wed, 2017-12-06 at 08:41 +0800, Jason Yan wrote: > On 2017/12/5 23:37, James Bottomley wrote: > > > > On Tue, 2017-12-05 at 20:37 +0800, Jason Yan wrote: > > > > > > > > > On 2017/12/1 23:35, James Bottomley wrote: > > > > > > > > > > > > On Fri, 2017-12-01 at 16:40 +0800, Jason Yan wrote: > > > > > > > > > > > > > > > On 2017/12/1 7:56, James Bottomley wrote: > > > > > > > > > > > > > > > > > > b/include/scsi/scsi_device.h > > > > > > index 571ddb49b926..2e4d48d8cd68 100644 > > > > > > --- a/include/scsi/scsi_device.h > > > > > > +++ b/include/scsi/scsi_device.h > > > > > > @@ -380,6 +380,23 @@ extern struct scsi_device > > > > > > *__scsi_iterate_devices(struct Scsi_Host *, > > > > > > #define __shost_for_each_device(sdev, shost) \ > > > > > > list_for_each_entry((sdev), &((shost)- > > > > > > >__devices), > > > > > > siblings) > > > > > > > > > > > > > > > > Seems that __shost_for_each_device() is still not safe. scsi > > > > > device > > > > > been deleted stays in the list and put_device() can be called > > > > > anywhere out of the host lock. > > > > > > > > Not if it's used with scsi_get_device(). As I said, I only did > > > > a cursory inspectiont, so if I've missed a loop, please > > > > specify. > > > > > > > > The point was more a demonstration of how we could fix the > > > > problem if we don't change get_device(). > > > > > > > > James > > > > > > > > > > Yes, it's OK now. __shost_for_each_device() is not used with > > > scsi_get_device() yet. > > > > > > Another problem is that put_device() cannot be called while > > > holding the host lock, > > > > Yes it can. That's one of the design goals of the execute in > > process context: you can call it from interrupt context and you can > > call it with locks held and we'll return immediately and delay all > > the dangerous stuff until we have a process context. > > > > To get the process context to be acquired, the in_interrupt() test > > must pass (so the spin lock must be acquired irqsave) ; is that > > condition missing anywhere? > > > > James > > > > > > Call it from interrupt context is ok. I'm talking about calling it > from process context. > > Think about this in a process context: > scsi_device_lookup() > ->spin_lock_irqsave(shost->host_lock, flags); > ->__scsi_device_lookup() > ->iterate and kobject_get_unless_zero() > ->put_device() > ->scsi_device_dev_release() if the last put > ->scsi_device_dev_release_usercontext() > ->acquire the host lock = deadlock execute_in_process_context() is supposed to produce us a context whenever the local context isn't available, and that's supposed to include when interrupts are disabled as in spin_lock_irqsave(). So let me ask this another way: have you seen this deadlock (which would mean we have a bug in execute_process_context())? James