Hi Trond, I'm trying to figure existing code in order to fix a problem. Can you please help? Say there was an IO to the DS and it timeouts. Pnfs code in the filelayout_async_handle_error() marks the device unavailable and marks the layout to be returned. Timed out IO is retried against the MDS. The new IO tries to do the pnfs and because the device is marked unavailable (returning EINVAL) it propagates back all the way to the pg_init setup and fails the IO with EINVAL. But IO shouldn't fail. My question is why are we returning the status of the filelayout_check_deviceid(), I propose that instead we at that point set lseg=NULL and then the IO would be redirected to the MDS. Or maybe I'm missing something else? My proposed fix would be: diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c index 61f46facb39c..b3e8ba3bd654 100644 --- a/fs/nfs/filelayout/filelayout.c +++ b/fs/nfs/filelayout/filelayout.c @@ -904,7 +904,7 @@ fl_pnfs_update_layout(struct inode *ino, status = filelayout_check_deviceid(lo, fl, gfp_flags); if (status) { pnfs_put_lseg(lseg); - lseg = ERR_PTR(status); + lseg = NULL; } out: return lseg;