On 25 Jun 2024, at 16:02, cel@xxxxxxxxxx wrote: > From: Chuck Lever <chuck.lever@xxxxxxxxxx> > > During generic/069 runs with pNFS SCSI layouts, the NFS client emits > the following in the system journal: > > kernel: pNFS: failed to open device /dev/disk/by-id/dm-uuid-mpath-0x6001405e3366f045b7949eb8e4540b51 (-2) > kernel: pNFS: using block device sdb (reservation key 0x666b60901e7b26b3) > kernel: pNFS: failed to open device /dev/disk/by-id/dm-uuid-mpath-0x6001405e3366f045b7949eb8e4540b51 (-2) > kernel: pNFS: using block device sdb (reservation key 0x666b60901e7b26b3) > kernel: sd 6:0:0:1: reservation conflict > kernel: sd 6:0:0:1: [sdb] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s > kernel: sd 6:0:0:1: [sdb] tag#16 CDB: Write(10) 2a 00 00 00 00 50 00 00 08 00 > kernel: reservation conflict error, dev sdb, sector 80 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2 > kernel: sd 6:0:0:1: reservation conflict > kernel: sd 6:0:0:1: reservation conflict > kernel: sd 6:0:0:1: [sdb] tag#18 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s > kernel: sd 6:0:0:1: [sdb] tag#17 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s > kernel: sd 6:0:0:1: [sdb] tag#18 CDB: Write(10) 2a 00 00 00 00 60 00 00 08 00 > kernel: sd 6:0:0:1: [sdb] tag#17 CDB: Write(10) 2a 00 00 00 00 58 00 00 08 00 > kernel: reservation conflict error, dev sdb, sector 96 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 > kernel: reservation conflict error, dev sdb, sector 88 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 > systemd[1]: fstests-generic-069.scope: Deactivated successfully. > systemd[1]: fstests-generic-069.scope: Consumed 5.092s CPU time. > systemd[1]: media-test.mount: Deactivated successfully. > systemd[1]: media-scratch.mount: Deactivated successfully. > kernel: sd 6:0:0:1: reservation conflict > kernel: failed to unregister PR key. > > This appears to be due to a race. bl_alloc_lseg() calls this: > > 561 static struct nfs4_deviceid_node * > 562 bl_find_get_deviceid(struct nfs_server *server, > 563 const struct nfs4_deviceid *id, const struct cred *cred, > 564 gfp_t gfp_mask) > 565 { > 566 struct nfs4_deviceid_node *node; > 567 unsigned long start, end; > 568 > 569 retry: > 570 node = nfs4_find_get_deviceid(server, id, cred, gfp_mask); > 571 if (!node) > 572 return ERR_PTR(-ENODEV); > > nfs4_find_get_deviceid() does a lookup without the spin lock first. > If it can't find a matching deviceid, it creates a new device_info > (which calls bl_alloc_deviceid_node, and that registers the device's > PR key). > > Then it takes the nfs4_deviceid_lock and looks up the deviceid again. > If it finds it this time, bl_find_get_deviceid() frees the spare > (new) device_info, which unregisters the PR key for the same device. > > Any subsequent I/O from this client on that device gets EBADE. > > The umount later unregisters the device's PR key again. > > To prevent this problem, register the PR key after the deviceid_node > lookup. > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > --- > fs/nfs/blocklayout/blocklayout.c | 25 +++++---- > fs/nfs/blocklayout/blocklayout.h | 9 +++- > fs/nfs/blocklayout/dev.c | 91 ++++++++++++++++++++++++-------- > 3 files changed, 94 insertions(+), 31 deletions(-) > > diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c > index 6be13e0ec170..0becdec12970 100644 > --- a/fs/nfs/blocklayout/blocklayout.c > +++ b/fs/nfs/blocklayout/blocklayout.c > @@ -564,25 +564,32 @@ bl_find_get_deviceid(struct nfs_server *server, > gfp_t gfp_mask) > { > struct nfs4_deviceid_node *node; > - unsigned long start, end; > + int err = -ENODEV; Just a nit - this err var seems unnecessary.. especially as still we do.. > retry: > node = nfs4_find_get_deviceid(server, id, cred, gfp_mask); > if (!node) > return ERR_PTR(-ENODEV); .. this, which seems clearer. Looking at the return at the bottom makes me think 'err' could be something else, but it can't. Looks good to me otherwise. Reviewed-by: Benjamin Coddington <bcodding@xxxxxxxxxx> Ben