[Bug 12020] scsi_times_out NULL pointer dereference

bugme-daemon@xxxxxxxxxxxxxxxxxxx · Thu, 13 Nov 2008 11:03:42 -0800 (PST)

http://bugzilla.kernel.org/show_bug.cgi?id=12020





------- Comment #1 from anonymous@xxxxxxxxxxxxxxxxxxxx  2008-11-13 11:03 -------
Reply-To: James.Bottomley@xxxxxxxxxxxxxxxxxxxxx

On Thu, 2008-11-13 at 10:30 -0800, bugme-daemon@xxxxxxxxxxxxxxxxxxx
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=12020
> 
>            Summary: scsi_times_out NULL pointer dereference
>            Product: SCSI Drivers
>            Version: 2.5
>      KernelVersion: 2.6.28-git20081113
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: scsi_drivers-other@xxxxxxxxxxxxxxxxxxxx
>         ReportedBy: bs@xxxxxxxxx
> 
> 
> Latest working kernel version: 2.6.27
> Earliest failing kernel version: 2.6.28-rc4
> Hardware Environment: Infortrend G2430 connected to LSI22320R
> Problem Description:
> 
> Hello,
> 
> first in 2.6.28-rc{1,2,3} the error handler was entirely broken - it
> deadlocked. In rc4 this is fixed, but now I already two times got a Null
> pointer dereference while doing some error handler tests. All of that looks
> like due to the scsi timeout commits.
> 
> Steps to reproduce: E.g. reset devices connected to LSI 53C1030 devices using
> lsiutil. Can be reproduced on about 20% eh activations.
> 
> (gdb) l *(scsi_times_out+0x15)
> 0xffffffff80460f1e is in scsi_times_out (drivers/scsi/scsi_error.c:176).
> 171             enum blk_eh_timer_return (*eh_timed_out)(struct scsi_cmnd *);
> 172             enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED;
> 173
> 174             scsi_log_completion(scmd, TIMEOUT_ERROR);
> 175
> 176             if (scmd->device->host->transportt->eh_timed_out)
> 177                     eh_timed_out =
> scmd->device->host->transportt->eh_timed_out;
> 178             else if (scmd->device->host->hostt->eh_timed_out)
> 179                     eh_timed_out = scmd->device->host->hostt->eh_timed_out;
> 180             else

Actually, I think the trace is slightly off.  I suspect this is the
problem:

        struct scsi_cmnd *scmd = req->special;

I bet req->special is NULL because the command timed out even before it
was prepared by the subsystem.

Does this fix it?

The fix is more of a bandaid than anything ... we can't really have
commands timing out in the mid-layer because we expect we have full
control of them.  With this patch, if we run out of resets, block will
complete a command we're still processing.

James

---

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 94ed262..5612c42 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -127,6 +127,13 @@ enum blk_eh_timer_return scsi_times_out(struct request
*req)
        enum blk_eh_timer_return (*eh_timed_out)(struct scsi_cmnd *);
        enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED;

+       if (!scmd)
+               /*
+                * nasty: command timed out before the mid layer
+                * even prepared it
+                */
+               return BLK_EH_RESET_TIMER;
+
        scsi_log_completion(scmd, TIMEOUT_ERROR);

        if (scmd->device->host->transportt->eh_timed_out)


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html