On 8/9/22 2:33 PM, Bart Van Assche wrote: > On 8/9/22 11:08, Mike Christie wrote: >> On 8/9/22 2:21 AM, Christoph Hellwig wrote: >>> On Mon, Aug 08, 2022 at 07:04:11PM -0500, Mike Christie wrote: >>>> To handle both cases, this patch adds a blk_status_t arg to the pr_ops >>>> callouts. The lower levels will convert their device specific error to >>>> the blk_status_t then the upper levels can easily check that code >>>> without knowing the device type. It also allows us to keep userspace >>>> compat where it expects a negative -Exyz error code if the command fails >>>> before it's sent to the device or a device/tranport specific value if the >>>> error is > 0. >>> >>> Why do we need two return values here? >> >> I know the 2 return values are gross :) I can do it in one, but I wasn't sure >> what's worse. See below for the other possible solutions. I think they are all >> bad. >> >> >> 0. Convert device specific conflict error to -EBADE then back: >> >> sd_pr_command() >> >> ..... >> >> /* would add similar check for NVME_SC_RESERVATION_CONFLICT in nvme */ >> if (result == SAM_STAT_CHECK_CONDITION) >> return -EBADE; >> else >> return result; >> >> >> LIO then just checks for -EBADE but when going to userspace we have to >> convert: >> >> >> blkdev_pr_register() >> >> ... >> result = ops->pr_register() >> if (result < 0) { >> /* For compat we must convert back to the nvme/scsi code */ >> if (result == -EBADE) { >> /* need some helper for this that calls down the stack */ >> if (bdev == SCSI) >> return SAM_STAT_RESERVATION_CONFLICT >> else >> return NVME_SC_RESERVATION_CONFLICT >> } else >> return blk_status_to_str(result) >> } else >> return result; >> >> >> The conversion is kind of gross and I was thinking in the future it's going >> to get worse. I'm going to want to have more advanced error handling in LIO >> and dm-multipath. Like dm-multipath wants to know if an pr_op failed because >> of a path failure, so it can retry another one, or a hard device/target error. >> It would be nice for LIO if an PGR had bad/illegal values and the device >> returned an error than I could detect that. >> >> >> 1. Drop the -Exyz error type and use blk_status_t in the kernel: >> >> sd_pr_command() >> >> ..... >> if (result < 0) >> return -errno_to_blk_status(result); >> else if (result == SAM_STAT_CHECK_CONDITION) >> return -BLK_STS_NEXUS; >> else >> return result; >> >> blkdev_pr_register() >> >> ... >> result = ops->pr_register() >> if (result < 0) { >> /* For compat we must convert back to the nvme/scsi code */ >> if (result == -BLK_STS_NEXUS) { >> /* need some helper for this that calls down the stack */ >> if (bdev == SCSI) >> return SAM_STAT_RESERVATION_CONFLICT >> else >> return NVME_SC_RESERVATION_CONFLICT >> } else >> return blk_status_to_str(result) >> } else >> return result; >> >> This has similar issues as #0 where we have to convert before returning to >> userspace. >> >> >> Note: In this case, if the block layer uses an -Exyz error code there's not >> BLK_STS for then we would return -EIO to userspace now. I was thinking >> that might not be ok but I could also just add a BLK_STS error code >> for errors like EINVAL, EWOULDBLOCK, ENOMEM, etc so that doesn't happen. >> >> >> 2. We could do something like below where the low levels are not changed but the >> caller converts: >> >> sd_pr_command() >> /* no changes */ >> >> lio() >> result = ops->pr_register() >> if (result > 0) { >> /* add some stacked helper again that goes through dm and >> * to the low level device >> */ >> if (bdev == SCSI) { >> result = scsi_result_to_blk_status(result) >> else >> result = nvme_error_status(result) >> >> >> This looks simple, but it felt wrong having upper layers having to >> know the device type and calling conversion functions. > > Has it been considered to introduce a new enumeration type instead of choosing (0), (1) or (2)? > The problem is that userspace currently gets the nvme status value or the scsi_cmnd->result which can be host/status byte values like with SG IO. So you could you just do a new enum or add every possible error to blk_status_t but before passing back to userspace you still have to then convert to what format userspace is getting today. So for scsi devices, you have to mimic the host_byte.