Note: description below applies to 2.6.24 as that's what was used to isolate the problem being described. There are comments at the end with regard to 2.6.25-rc4-git3. The below email thread directly applies to this bug/patch. http://marc.info/?l=linux-scsi&m=120222578814366&w=2 -- While performing raid reset testing using 2.6.24, every once in the while I noticed that one of the raid's management tools would not exit. I tracked this down to it waiting on an inquiry command's completion. An infinite loop occurs because there is an occasional inquiry command submitted to the host driver with the associated bio having a bi_size field of 512 when the inquiry command's transfer length is 255. This "enlarged" bi_size field causes the completion handler to re-issue the inquiry command because there are "leftovers" following the completion. scsi_finish_command() calls scsi_io_completion() with good_bytes equal to cmd->request_bufflen. scsi_io_completion() sets req->data_len to the residual provided by the LLDD. scsi_io_completion() calls scsi_end_request() with the good_bytes value of 255. scsi_end_request() calls end_that_request_chunk() which calls __end_that_request_first() which uses bio->bi_size as the size of the request. As bi_size is larger than the number of bytes (nr_bytes) being completed, it returns "1" indicating that there wasn't enough data to finish the request. scsi_end_request() then calls scsi_requeue_command() to resubmit and fetch that data which wasn't received. This results in the command being reissued using the residual value returned by the driver as the request_bufflen for the next command. This repeats itself until the residual is zero and bi_size is non-zero at which point the command gets reissued with a request_bufflen of zero and sc_data_direction of DMA_NONE. The subsequent completions are retried infinitely because bi_size is still non-zero. What drove me crazy is that this didn't happen with every inquiry command issued by the application. It was very infrequent. I finally tracked it down to the sg driver. sg_ioctl() calls sg_new_write() which calls sg_common_write() which calls sg_start_req(). sg_start_req() either calls sg_link_reserve() or sg_build_indirect() to set up the data transfer. The call to sg_build_indirect() ROUNDS UP THE REQUEST SIZE to a sector boundary, and this is what causes the infinite loop. I tried removing the round up statement in sg_build_indirect() and forcing sg_start_req() to always call sg_build_indirect() to test the change and the infinite loop did not occur. To me, it seems that if sg_link_reserve() has no need to round up to a sector boundary then sg_build_indirect() should also not need to round up the request length. I've included a patch (against 2.6.25-rc4-git3) which removes the rounding in sg_build_indirect() for your comments. I then tried my test case with 2.6.25-rc4-git3 and it didn't happen. I noticed the change in the routine scsi_req_map_sg() which deletes a line which incremented the "data_len" variable at the start of the for_each_sg() loop. This was also a line which I examined in 6.5.24. But, I noticed comments about sg indicating that it "sends a scatterlist that is larger than the data_len it wants transferred for certain IO sizes", and as that appeared to be what was happening, I went in search of what was up with sg. So, will the below patch break anything? Signed-off-by: Michael Reed <mdr@xxxxxxx> --- linux-2.6.25-rc4-git3/drivers/scsi/sg.c 2008-03-10 13:21:06.000000000 -0700 +++ linux-2.6.25-rc4-git3-modified/drivers/scsi/sg.c 2008-03-11 14:47:32.376281307 -0700 @@ -1829,17 +1829,13 @@ struct scatterlist *sg; int ret_sz = 0, k, rem_sz, num, mx_sc_elems; int sg_tablesize = sfp->parentdp->sg_tablesize; - int blk_size = buff_size; struct page *p = NULL; - if (blk_size < 0) + if (buff_size < 0) return -EFAULT; - if (0 == blk_size) - ++blk_size; /* don't know why */ -/* round request up to next highest SG_SECTOR_SZ byte boundary */ - blk_size = (blk_size + SG_SECTOR_MSK) & (~SG_SECTOR_MSK); - SCSI_LOG_TIMEOUT(4, printk("sg_build_indirect: buff_size=%d, blk_size=%d\n", - buff_size, blk_size)); + + SCSI_LOG_TIMEOUT(4, printk("sg_build_indirect: buff_size=%d\n", + buff_size)); /* N.B. ret_sz carried into this block ... */ mx_sc_elems = sg_build_sgat(schp, sfp, sg_tablesize); @@ -1854,7 +1850,7 @@ } else scatter_elem_sz_prev = num; } - for (k = 0, sg = schp->buffer, rem_sz = blk_size; + for (k = 0, sg = schp->buffer, rem_sz = buff_size; (rem_sz > 0) && (k < mx_sc_elems); ++k, rem_sz -= ret_sz, sg = sg_next(sg)) { @@ -1880,7 +1876,7 @@ SCSI_LOG_TIMEOUT(5, printk("sg_build_indirect: k_use_sg=%d, " "rem_sz=%d\n", k, rem_sz)); - schp->bufflen = blk_size; + schp->bufflen = buff_size; if (rem_sz > 0) /* must have failed */ return -ENOMEM; -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html