On Wed, Jan 25 2012 at 4:39pm -0500, Moger, Babu <Babu.Moger@xxxxxxxxxx> wrote: > > -----Original Message----- > > From: Mike Snitzer [mailto:snitzer@xxxxxxxxxx] > > Sent: Tuesday, January 24, 2012 5:03 PM > > To: Moger, Babu > > Cc: linux-scsi@xxxxxxxxxxxxxxx; device-mapper development (dm- > > devel@xxxxxxxxxx) > > Subject: Re: [PATCH 2/2] scsi : fixing the new host byte settings > > (DID_TARGET_FAILURE and DID_NEXUS_FAILURE) > > > > Hi Babu, > > > > Thanks for finding this. > > > > On Tue, Jan 24 2012 at 3:38pm -0500, > > Moger, Babu <Babu.Moger@xxxxxxxxxx> wrote: > > > > > Resubmitting as my previous post had format issues and did not go linux-scsi. > > > > > > This patch fixes the host byte settings DID_TARGET_FAILURE and > > DID_NEXUS_FAILURE. > > > The function __scsi_error_from_host_byte, tries to reset the host byte to > > DID_OK. But that > > > does not happen because of the OR operation. > > > > > > Here is the flow. > > > scsi_softirq_done-> scsi_decide_disposition -> __scsi_error_from_host_byte > > > > or more accurately: > > > > scsi_softirq_done -> scsi_decide_disposition > > scsi_softirq_done -> scsi_finish_command -> scsi_io_completion -> > > __scsi_error_from_host_byte > > > > > Let's take an example with DID_NEXUS_FAILURE. In scsi_decide_disposition, > > result will be set as > > > DID_NEXUS_FAILURE (=0x11). Then in __scsi_error_from_host_byte, when we > > do OR with > > > DID_OK. Purpose is to reset it back to DID_OK. But that does not happen. This > > patch fixes this issue. > > > > We clearly aren't properly resetting to DID_OK but I'm not seeing an > > obvious "nasty bug" that is lurking due to this. Am I missing > > something? > > Yes. It is causing some issues in our proprietary multipath driver. Normally, our assumption > is that host status overrides all other statuses. If host status is set to status other than DID_OK > then we normally ignore other statuses(like reading the check sense). We have worked this around. > My assumption is, most of the user Level code does the same thing. It might give wrong impression > about the kind of error. > > One question.. Did the newlines wrapped in this patch also? Looks fine to me. > > __scsi_error_from_host_byte() is setting error which is passed back up > > via blk_end_request() and blk_end_request_all(). And in my previous > > testing I know that corresponding errors are making it out to > > dm-multipath (e.g. -EREMOTEIO). > > > > Also, your patch header is missing the location where DID_OK is not > > properly matched (because it wasn't set exclussively due to being > > I am not sure what you meant here. Well like I said, it is clear that scsi_noretry_cmd() won't match the DID_OK case in the host_byte select statement. I was just wondering where else this improperly set DID_OK was causing a DID_OK match to not happen. But the fact that this is impacting your proprietary multipath driver basically answers my question (I was trying to understand what was ultimately broken as a result of us improperly resetting to DID_OK). All said: Acked-by: Mike Snitzer <snitzer@xxxxxxxxxx> -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel