On 04/20/2011 07:24 PM, Bhanu Prakash Gollapudi wrote:
Hi, We are seeing a similar issue to what Joe has observed a while back - http://www.mail-archive.com/devel@xxxxxxxxxxxxx/msg02993.html. This happens in a very corner case scenario by creating and destroying fcoe interface in a tight loop. (fcoeadm -c followed by fcoeadm -d). The system had a simple configuration with a single local port 2 remote ports. Reason for the deadlock: 1. destroy (fcoeadm -d) thread hangs in fc_remove_host(). 2. fc_remove_host() is trying to flush the shost->work_q, via scsi_flush_work(), but the operation never completes. 3. There are two works scheduled to be run in this work_q, one belonging to rport A, and other rport B. 4. The thread is currently executing rport_delete_work (fc_rport_final _delete) for rport A. It calls fc_terminate_rport_io() that unblocks the sdev->request_queue, so that __blk_run_queue() can be called. So, IO for rport A is ready to run, but stuck at the async layer. 5. Meanwhile, async layer is serializing all the IOs belonging to both rport A and rport B. At this point, it is waiting for IO belonging to rport B to complete. 6. However, the request_queue for rport B is stopped and fc_terminate_rport_io on rport B is not called yet to unblock the device, which will only be called after rport A completes. rport A does
Is the reason that rport b's terminate_rport_io has not been called, because that workqueue is queued behind rport a's workqueue and rport b's workqueue function is not called? If so, have you tested this with the current upstream kernel?
-- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html