On Wed, Mar 23, 2011 at 11:26 PM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > On Wed, Mar 23, 2011 at 05:07:49PM -0700, Dan Williams wrote: >> One of the fixes I made in that interim timer api cleanup (8d32e5b2) >> was to use del_timer_sync on all pre-allocated timers to make sure >> nothing fired or was in the process of firing as the driver was >> brought down (unlikely). The core runs under >> spin_lock_irqsave(ihost->scic_lock) context, and the expectation is >> that all these timeout callbacks are done in that context. > > I'm afraid that you'll have to fix that design mistake first. Running > large amounts of code under a spinlock just is not a good idea for > a wide range of reasons. I have been waiting for this issue to be raised. This was one of the first items brought up in our internal reviews of the Linux driver. Why does it still exist and what is the rationale for addressing it incrementally?: Starting with simple locking and then scaling it is arguably easier than unwinding a more complex locking scheme implemented too early in the design phase. The lock synchronizes i/o submission against the completion/event-queue. When the event queue runs the state of the world might change (phy, port, task context, remote node context), so the lock is ensuring a self consistent view of the state machines for new submissions. Now that the core is a slower moving target (in terms of code changes) we are incrementally peeling back / minimizing the code run under the lock where it is found to be problematic. I will say that in terms of lock overhead investigations one of things that makes this lock heavier than it needs to be (irqs disabled) is libata's need to hold host_lock over submissions. That situation is problematic for other reasons [1], and is something that has been in the back of my mind since first evaluating isci locking. I expect our fast path locking overhead is on par with mvsas which holds an irqs-disabled lock across the interrupt handler, and the entire lldd_execute_task path. The higher touch slow path cases, like SATA exception / PIO handling, probably have similar overhead to libata's locking, but yes this needs to quantified and addressed. -- Dan [1]: http://marc.info/?l=linux-scsi&m=129791105419577&w=2 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html