On 8/9/22 2:21 AM, Christoph Hellwig wrote: > On Mon, Aug 08, 2022 at 07:04:11PM -0500, Mike Christie wrote: >> To handle both cases, this patch adds a blk_status_t arg to the pr_ops >> callouts. The lower levels will convert their device specific error to >> the blk_status_t then the upper levels can easily check that code >> without knowing the device type. It also allows us to keep userspace >> compat where it expects a negative -Exyz error code if the command fails >> before it's sent to the device or a device/tranport specific value if the >> error is > 0. > > Why do we need two return values here? I know the 2 return values are gross :) I can do it in one, but I wasn't sure what's worse. See below for the other possible solutions. I think they are all bad. 0. Convert device specific conflict error to -EBADE then back: sd_pr_command() ..... /* would add similar check for NVME_SC_RESERVATION_CONFLICT in nvme */ if (result == SAM_STAT_CHECK_CONDITION) return -EBADE; else return result; LIO then just checks for -EBADE but when going to userspace we have to convert: blkdev_pr_register() ... result = ops->pr_register() if (result < 0) { /* For compat we must convert back to the nvme/scsi code */ if (result == -EBADE) { /* need some helper for this that calls down the stack */ if (bdev == SCSI) return SAM_STAT_RESERVATION_CONFLICT else return NVME_SC_RESERVATION_CONFLICT } else return blk_status_to_str(result) } else return result; The conversion is kind of gross and I was thinking in the future it's going to get worse. I'm going to want to have more advanced error handling in LIO and dm-multipath. Like dm-multipath wants to know if an pr_op failed because of a path failure, so it can retry another one, or a hard device/target error. It would be nice for LIO if an PGR had bad/illegal values and the device returned an error than I could detect that. 1. Drop the -Exyz error type and use blk_status_t in the kernel: sd_pr_command() ..... if (result < 0) return -errno_to_blk_status(result); else if (result == SAM_STAT_CHECK_CONDITION) return -BLK_STS_NEXUS; else return result; blkdev_pr_register() ... result = ops->pr_register() if (result < 0) { /* For compat we must convert back to the nvme/scsi code */ if (result == -BLK_STS_NEXUS) { /* need some helper for this that calls down the stack */ if (bdev == SCSI) return SAM_STAT_RESERVATION_CONFLICT else return NVME_SC_RESERVATION_CONFLICT } else return blk_status_to_str(result) } else return result; This has similar issues as #0 where we have to convert before returning to userspace. Note: In this case, if the block layer uses an -Exyz error code there's not BLK_STS for then we would return -EIO to userspace now. I was thinking that might not be ok but I could also just add a BLK_STS error code for errors like EINVAL, EWOULDBLOCK, ENOMEM, etc so that doesn't happen. 2. We could do something like below where the low levels are not changed but the caller converts: sd_pr_command() /* no changes */ lio() result = ops->pr_register() if (result > 0) { /* add some stacked helper again that goes through dm and * to the low level device */ if (bdev == SCSI) { result = scsi_result_to_blk_status(result) else result = nvme_error_status(result) This looks simple, but it felt wrong having upper layers having to know the device type and calling conversion functions.