On Wed, Feb 03, 2016 at 11:38:16PM +0100, Sebastian Herbszt wrote: > James Bottomley wrote: > > On Mon, 2016-02-01 at 19:43 -0800, Bart Van Assche wrote: > > > On 01/19/16 17:03, James Bottomley wrote: > > > > On Tue, 2016-01-19 at 19:30 -0500, Martin K. Petersen wrote: > > > > > > > > > > "Bart" == Bart Van Assche <bart.vanassche@xxxxxxxxxxx> > > > > > > > > > > writes: > > > > > > > > > > Bart> Instead of representing the states "visible in sysfs" and > > > > > "has > > > > > Bart> been removed from the target list" by a single state > > > > > variable, > > > > > use > > > > > Bart> two variables to represent this information. > > > > > > > > > > James: Are you happy with the latest iteration of this? Should I > > > > > queue > > > > > it? > > > > > > > > Well, I'm OK with the patch: it's a simple transformation of the > > > > enumerated state to a two bit state. What I can't see is how it > > > > fixes > > > > any soft lockup. > > > > > > > > The only change from the current workflow is that the DEL > > > > transition > > > > (now the reaped flag) is done before the spin lock is dropped which > > > > would fix a tiny window for two threads both trying to remove the > > > > same > > > > target, but there's nothing that could possibly fix an iterative > > > > soft > > > > lockup caused by restarting the loop, which is what the changelog > > > > says. > > > > > > Hello James, > > > > > > scsi_remove_target() doesn't lock the scan_mutex which means that > > > concurrent SCSI scanning activity is not prohibited. Such scanning > > > activity can postpone the transition of the state of a SCSI target > > > into STARGET_DEL. I think if the scheduler decides to run the thread > > > that executes scsi_remove_target() on the same CPU as the scanning > > > code after the scanning code has obtained a reap ref and before the > > > scanning code has released the reap ref again that the soft lockup > > > can be triggered that has been reported by Sebastian Herbszt. > > > > OK, I finally understand the scenario; I'm not sure I understand how > > we're getting concurrent scanning and removal from a simple rmmod ... I > > take it this is insmod rmmod in a tight loop? > > I am able to trigger the soft lockup with this test case run once: > > modprobe lpfc > run fio for 10 seconds > rmmod lpfc > > My test setup involves running qla2xxx in target mode (SCST) and > lpfc as initiator on the same system with one exported volume. > > Dick, how did you trigger the lockup? > > Sebastian Hi James, Bart, Martin Have you already decided, which of the two patches you favour and when it'll be included? I have several customer reports that hit this lockup and I don't want to include one of the patches from the list just to find out the other one's is used in mainline. Thanks in advance, Johannes -- Johannes Thumshirn Storage jthumshirn@xxxxxxx +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html