> On Jan 19, 2023, at 14:40, Olga Kornievskaia <olga.kornievskaia@xxxxxxxxx> wrote: > > On Thu, Jan 19, 2023 at 1:24 PM Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: >> >> >> >>> On Jan 19, 2023, at 12:50, Olga Kornievskaia <olga.kornievskaia@xxxxxxxxx> wrote: >>> >>> If the call to GETDEVICEINFO fails, the client fallback to doing IO >>> to MDS but on every new IO call, the client tries to get the device >>> again. Instead, mark the layout as unavailable as well. This way >>> the client will re-try after a timeout period. >>> >>> Signed-off-by: Olga Kornievskaia <kolga@xxxxxxxxxx> >>> --- >>> fs/nfs/filelayout/filelayout.c | 1 + >>> fs/nfs/pnfs.c | 7 +++++++ >>> fs/nfs/pnfs.h | 2 ++ >>> 3 files changed, 10 insertions(+) >>> >>> diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c >>> index 4974cd18ca46..13df85457cf5 100644 >>> --- a/fs/nfs/filelayout/filelayout.c >>> +++ b/fs/nfs/filelayout/filelayout.c >>> @@ -862,6 +862,7 @@ fl_pnfs_update_layout(struct inode *ino, >>> >>> status = filelayout_check_deviceid(lo, fl, gfp_flags); >>> if (status) { >>> + pnfs_mark_layout_unavailable(lo, iomode); >>> pnfs_put_lseg(lseg); >>> lseg = NULL; >>> } >>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c >>> index a5db5158c634..bac15dcf99bb 100644 >>> --- a/fs/nfs/pnfs.c >>> +++ b/fs/nfs/pnfs.c >>> @@ -491,6 +491,13 @@ pnfs_layout_set_fail_bit(struct pnfs_layout_hdr *lo, int fail_bit) >>> refcount_inc(&lo->plh_refcount); >>> } >>> >>> +void >>> +pnfs_mark_layout_unavailable(struct pnfs_layout_hdr *lo, enum pnfs_iomode fail_bit) >>> +{ >>> + pnfs_layout_set_fail_bit(lo, pnfs_iomode_to_fail_bit(fail_bit)); >> >> I suggest rather using pnfs_layout_io_set_failed() so that we also evict the layout segment that references this unrecognised deviceid. In fact, there is already an exported function pnfs_set_lo_fail() (which could definitely do with a better name!) that does this. > > I'm not opposed to this approach. In the proposed patch, I treated it > as the layout being still valid but unavailable. My question is: I > think we need to return the layout before doing this, correct? Should > I be making changes to export something like > pnfs_set_plh_return_info() or would adding a new function to pnfs.c > that does pnfs_set_lo_fail and returns the layout? I suggest rather changing the call to pnfs_mark_matching_lsegs_invalid() in pnfs_layout_io_set_failed() and make it call pnfs_mark_matching_lsegs_return() instead. > > something like > +void pnfs_invalidate_return_layout(struct pnfs_layout_segment *lseg) > +{ > + pnfs_layout_io_set_failed(lseg->pls_layout, lseg->pls_range.iomode); > + pnfs_set_plh_return_info(lseg->pls_layout, lseg->pls_range.iomode, 0); ^^^^^^^ this won’t suffice to invalidate the layout. > +} > +EXPORT_SYMBOL_GPL(pnfs_invalidate_return_layout); > > status = filelayout_check_deviceid(lo, fl, gfp_flags); > if (status) { > + pnfs_invalidate_return_layout(lseg); > pnfs_put_lseg(lseg); > lseg = NULL; > >> >>> +} >>> +EXPORT_SYMBOL_GPL(pnfs_mark_layout_unavailable); >>> + >>> static void >>> pnfs_layout_clear_fail_bit(struct pnfs_layout_hdr *lo, int fail_bit) >>> { >>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h >>> index e3e6a41f19de..9f47bd883fc3 100644 >>> --- a/fs/nfs/pnfs.h >>> +++ b/fs/nfs/pnfs.h >>> @@ -343,6 +343,8 @@ void pnfs_error_mark_layout_for_return(struct inode *inode, >>> void pnfs_layout_return_unused_byclid(struct nfs_client *clp, >>> enum pnfs_iomode iomode); >>> >>> +void pnfs_mark_layout_unavailable(struct pnfs_layout_hdr *lo, >>> + enum pnfs_iomode iomode); >>> /* nfs4_deviceid_flags */ >>> enum { >>> NFS_DEVICEID_INVALID = 0, /* set when MDS clientid recalled */ >>> -- >>> 2.31.1 >>> _________________________________ Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx