> On Aug 10, 2015, at 10:41 PM, Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> wrote: > > On Mon, 2015-08-10 at 12:09 -0500, Matthew R. Ochs wrote: >> Introduce support for enhanced I/O error handling. >> >> Signed-off-by: Matthew R. Ochs <mrochs@xxxxxxxxxxxxxxxxxx> >> Signed-off-by: Manoj N. Kumar <manoj@xxxxxxxxxxxxxxxxxx> >> --- > > So I'm not necessarily very qualified to review SCSI bits as I haven't > done anything close to the Linux SCSI code for almost a decade but I > have a couple of questions and nits… Thanks for reviewing Ben! > >> wait_queue_head_t tmf_waitq; >> bool tmf_active; >> - u8 err_recovery_active:1; >> + wait_queue_head_t limbo_waitq; >> + enum cxlflash_state state; >> }; >> >> struct afu_cmd { >> diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c >> index 76a7286..18359d4 100644 >> --- a/drivers/scsi/cxlflash/main.c >> +++ b/drivers/scsi/cxlflash/main.c >> @@ -380,6 +380,18 @@ static int cxlflash_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scp) >> } >> spin_unlock_irqrestore(&cfg->tmf_waitq.lock, lock_flags); >> >> + switch (cfg->state) { >> + case STATE_LIMBO: >> + pr_debug_ratelimited("%s: device in limbo!\n", __func__); >> + rc = SCSI_MLQUEUE_HOST_BUSY; >> + goto out; >> + case STATE_FAILTERM: >> + pr_debug_ratelimited("%s: device has failed!\n", __func__); >> + goto error; >> + default: >> + break; >> + } > > I noticed that you mostly read and write that new state outside of your > tmf_waitq.lock. Is there any other lock or mutex protecting you ? In > this specific case ? No there isn’t. > > I know in the old day queuecommand was called with a host lock, is that > still the case ? No, it’s no longer the case. > Or you just don't care about an occasional spurrious > SCSI_MLQUEUE_HOST_BUSY ? This is pretty much true. =) > >> cmd = cxlflash_cmd_checkout(afu); >> if (unlikely(!cmd)) { >> pr_err("%s: could not get a free command\n", __func__); >> @@ -428,6 +440,10 @@ static int cxlflash_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scp) >> >> out: >> return rc; >> +error: >> + scp->result = (DID_NO_CONNECT << 16); >> + scp->scsi_done(scp); >> + return 0; >> } >> >> /** >> @@ -455,9 +471,21 @@ static int cxlflash_eh_device_reset_handler(struct scsi_cmnd *scp) >> get_unaligned_be32(&((u32 *)scp->cmnd)[2]), >> get_unaligned_be32(&((u32 *)scp->cmnd)[3])); >> >> - rcr = send_tmf(afu, scp, TMF_LUN_RESET); >> - if (unlikely(rcr)) >> + switch (cfg->state) { >> + case STATE_NORMAL: >> + rcr = send_tmf(afu, scp, TMF_LUN_RESET); >> + if (unlikely(rcr)) >> + rc = FAILED; >> + break; >> + case STATE_LIMBO: >> + wait_event(cfg->limbo_waitq, cfg->state != STATE_LIMBO); >> + if (cfg->state == STATE_NORMAL) >> + break; >> + /* fall through */ >> + default: >> rc = FAILED; >> + break; >> + } > > Same here, since you are doing wait_event, I assume no lock is held > (unless it's a mutex) and in subsequent places I am not listing. > > As I said, it could well be that it all works fine but it would be worth > mentioning in this case because it's not obvious from simply reading the > code. It has been working well, and yes, your assumption is correct. We can look at adding more/better serialization as a future enhancement. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html