Hi all,
I'm trying to understand asynchronous abort in the current upstream
code, and the code seems to have some dubious locking.
Here are some examples of the issue:
1) dangling pointers: scsi_put_command calls cancel_delayed_work(), but
that doesn't mean that the scmd_eh_abort_handler couldn't be already
running. If the scmd_eh_abort_handler starts while the softirq handler
is calling scsi_put_command (e.g. scsi_finish_command ->
scsi_io_completion -> scsi_end_request -> scsi_next_command), the
pointer to the Scsi_Cmnd* becomes invalid in the middle of the abort
handler.
2) reentrancy: the softirq handler and scmd_eh_abort_handler can run
concurrently, and call scsi_finish_command without any lock protecting
the calls. You can then get memory corruption.
I don't have any reproducer for this; we're seeing related crashes in
virtio-scsi EH but those are due to a bug in the driver. But it means
that I have no sensible way to write the eh_abort_handler.
Example (1) means that the eh_abort_handler cannot use the passed
Scsi_Cmnd, because it might not even be valid when entering the
eh_abort_handler. Example (2) means that the eh_abort_handler cannot
return SUCCESS if it detects that the command has been completed in the
meanwhile.
Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html