On 20 Nov 2024, at 10:22, Chuck Lever wrote: > On Wed, Nov 20, 2024 at 09:09:35AM -0500, Benjamin Coddington wrote: >> If we're unable to register a SCSI device, ensure we mark the device as >> unavailable so that it will timeout and be re-added via GETDEVINFO. This >> avoids repeated doomed attempts to register a device in the IO path. >> >> Add some clarifying comments as well. >> >> Fixes: d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration") >> Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx> >> Reviewed-by: Christoph Hellwig <hch@xxxxxx> >> --- >> fs/nfs/blocklayout/blocklayout.c | 12 +++++++++++- >> 1 file changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c >> index 0becdec12970..b36bc2f4f7e2 100644 >> --- a/fs/nfs/blocklayout/blocklayout.c >> +++ b/fs/nfs/blocklayout/blocklayout.c >> @@ -571,19 +571,29 @@ bl_find_get_deviceid(struct nfs_server *server, >> if (!node) >> return ERR_PTR(-ENODEV); >> >> + /* >> + * Devices that are marked unavailable are left in the cache with a >> + * timeout to avoid sending GETDEVINFO after every LAYOUTGET, or >> + * constantly attempting to register the device. Once marked as >> + * unavailable they must be deleted and never reused. >> + */ >> if (test_bit(NFS_DEVICEID_UNAVAILABLE, &node->flags)) { >> unsigned long end = jiffies; >> unsigned long start = end - PNFS_DEVICE_RETRY_TIMEOUT; >> >> if (!time_in_range(node->timestamp_unavailable, start, end)) { >> + /* Force a new GETDEVINFO for this LAYOUT */ > > Or perhaps: "Uncork subsequent GETDEVINFO operations for this device" > <shrug> Sure, ok! >> nfs4_delete_deviceid(node->ld, node->nfs_client, id); >> goto retry; >> } >> goto out_put; >> } >> >> - if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node))) >> + /* If we cannot register, treat this device as transient */ > > How about "Make a negative cache entry for this device" Hmm - that's closer to the dentry language rather than how we refer to temporary error cases in device land. For me the "transient" has some hopeful meaning as in we expect this might work in the future - but I'm ok changing this comment. There will be some NFS clients that might try to do pNFS SCSI but will never actually have the devices locally, and so that's not a "transient" situation. This can only fixed today with export policy. > >> + if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node))) { >> + nfs4_mark_deviceid_unavailable(node); >> goto out_put; >> + } >> >> return node; >> >> -- >> 2.47.0 >> > > It took me a bit to understand what this patch does. It is like > setting up a negative dentry so the local device cache absorbs > bursts of checks for the device. OK. Yes, its like the layout error handling, but for devices. Its not obvious at this layer, but every IO wants to do LAYOUTGET, then figure out which device GETDEVINFO, then here we need to prep the device with a reservation. Its a lot of slow work that makes a mess of IO latencies if one of the later steps is going to fail for awhile. > Just an observation: Negative caching has some consequences too. > For instance, there will now be a period where, if the device > happens to become available, the layout is still unusable. I wonder > if that's going to have some undesirable operational effects. It sure does, but I don't think there's a simple way to get notified that a SCSI device has re-appeared or has started supporting persistent reservations. Ben