On Tue, Jun 23, 2020 at 11:25:53AM +0000, Niklas Cassel wrote: > On Tue, Jun 23, 2020 at 01:53:47AM -0700, Sagi Grimberg wrote: > > On 6/22/20 9:25 AM, Keith Busch wrote: > > > - len = nvme_process_ns_desc(ctrl, ids, cur); > > > + len = nvme_process_ns_desc(ctrl, ids, cur, &csi_seen); > > > if (len < 0) > > > goto free_data; > > > len += sizeof(*cur); > > > } > > > free_data: > > > + if (!status && nvme_multi_css(ctrl) && !csi_seen) { > > > > We will clear the status if we detect a path error, that is to > > avoid needlessly removing the ns for path failures, so you should > > check at the goto site. > > The problem is that this check has to be done after checking all the ns descs, > so this check to be done as the final thing, at least after processing all the > ns descs. No matter if nvme_process_ns_desc() returned an error, or if > simply NVME_NIDT_CSI wasn't part of the ns desc list, so the loop reached the > end without error. > > Even if the nvme command failed and the status was cleared: > > if (status > 0 && !(status & NVME_SC_DNR)) > status = 0; This check is so weird. What has DNR got to do with whether or not we want to continue with this namespace? The commit that adds this says it's to check for a host failed IO, but a controller can just as validly set DNR in its error status, in which case we'd still want clear the status.