Re: [PATCH] pNFS: make DS availability problems visible in log

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ben

  Yes you're right always some intervention is required but I guess
something on logs is a bit more exposed to log scan/monitoring tools
from the start,
and in general quicker to notice. I understand patch is way too chatty though.

  But  in the event of i.e. a mount, and no DS is available and we
divert I/O to MDS, i don't think its a bad idea notify
that scenario on logs, at least once, can give user a hint some part
of the pNFS infrastructure went away. That was my original point, but
since life
goes on through MDS maybe is not worth a more alarming log message, i
can go just for tracepoints.

rgds
roberto

On Fri, Mar 5, 2021 at 2:42 PM Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
>
> There's always going to be some sort of user intervention, even if the
> intervention is to read the system's logs.  What does the solution to
> notice DS problems ideally look like to the user?  Instead of "check the
> logs for DS problems", couldn't you replace that with "run this script
> to check for DS problems"?
>
> Ben
>
> On 5 Mar 2021, at 7:29, Roberto Bergantinos Corpas wrote:
>
> > Trond, Chuck
> >
> > thanks for feedback. I understand the point
> >
> > Sure we can turn those printks into tracepoints, done. However
> > original point was flag somehow DS issues
> > outside debug, i.e. user intervention. What about just a pr_warn_once
> > just for the case when at _nfs4_pnfs_v4_ds_connect
> > we could not reach any DS ?
> >
> > On Thu, Mar 4, 2021 at 9:35 PM Trond Myklebust
> > <trondmy@xxxxxxxxxxxxxxx> wrote:
> >>
> >> On Thu, 2021-03-04 at 20:20 +0000, Chuck Lever wrote:
> >>> Hello Roberto!
> >>>
> >>>> On Mar 3, 2021, at 10:33 AM, Roberto Bergantinos Corpas <
> >>>> rbergant@xxxxxxxxxx> wrote:
> >>>>
> >>>> Would be interesting to promote DS availability logging outside
> >>>> debug
> >>>> so that we are more aware that I/O is diverted to MDS and some part
> >>>> of the infraestructure failed.
> >>>>
> >>>> Also added logging for failed DS connection attempts.
> >>>
> >>> Given that this enables remote system behavior to generate
> >>> kernel log traffic that can fill the local root partition,
> >>> I'd like to see either:
> >>>
> >>> - the explicit use of rate limiting, or
> >>>
> >>> - these dprintks replaced with tracepoints
> >>
> >> I cannot accept a pr_warn(), even a rate limited one, for a timeout
> >> error or for a connection error in the data path. Those are just too
> >> nasty to deal with in a syslog.
> >>
> >> Tracepoints would be acceptable.
> >>
> >>>
> >>>
> >>>> Signed-off-by: Roberto Bergantinos Corpas <rbergant@xxxxxxxxxx>
> >>>> ---
> >>>> fs/nfs/filelayout/filelayout.c         | 4 ++--
> >>>> fs/nfs/flexfilelayout/flexfilelayout.c | 6 +++---
> >>>> fs/nfs/pnfs_nfs.c                      | 6 +++++-
> >>>> 3 files changed, 10 insertions(+), 6 deletions(-)
> >>>>
> >>>> diff --git a/fs/nfs/filelayout/filelayout.c
> >>>> b/fs/nfs/filelayout/filelayout.c
> >>>> index 7f5aa0403e16..fef2d31a501a 100644
> >>>> --- a/fs/nfs/filelayout/filelayout.c
> >>>> +++ b/fs/nfs/filelayout/filelayout.c
> >>>> @@ -181,7 +181,7 @@ static int filelayout_async_handle_error(struct
> >>>> rpc_task *task,
> >>>>         case -EIO:
> >>>>         case -ETIMEDOUT:
> >>>>         case -EPIPE:
> >>>> -               dprintk("%s DS connection error %d\n", __func__,
> >>>> +               pr_warn("%s DS connection error %d\n", __func__,
> >>>>                         task->tk_status);
> >>>>                 nfs4_mark_deviceid_unavailable(devid);
> >>>>                 pnfs_error_mark_layout_for_return(inode, lseg);
> >>>> @@ -190,7 +190,7 @@ static int filelayout_async_handle_error(struct
> >>>> rpc_task *task,
> >>>>                 fallthrough;
> >>>>         default:
> >>>> reset:
> >>>> -               dprintk("%s Retry through MDS. Error %d\n",
> >>>> __func__,
> >>>> +               pr_warn("%s Retry through MDS. Error %d\n",
> >>>> __func__,
> >>>>                         task->tk_status);
> >>>>                 return -NFS4ERR_RESET_TO_MDS;
> >>>>         }
> >>>> diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c
> >>>> b/fs/nfs/flexfilelayout/flexfilelayout.c
> >>>> index a163533446fa..7150d94e80e6 100644
> >>>> --- a/fs/nfs/flexfilelayout/flexfilelayout.c
> >>>> +++ b/fs/nfs/flexfilelayout/flexfilelayout.c
> >>>> @@ -1129,7 +1129,7 @@ static int
> >>>> ff_layout_async_handle_error_v4(struct rpc_task *task,
> >>>>         case -EIO:
> >>>>         case -ETIMEDOUT:
> >>>>         case -EPIPE:
> >>>> -               dprintk("%s DS connection error %d\n", __func__,
> >>>> +               pr_warn("%s DS connection error %d\n", __func__,
> >>>>                         task->tk_status);
> >>>>                 nfs4_delete_deviceid(devid->ld, devid->nfs_client,
> >>>>                                 &devid->deviceid);
> >>>> @@ -1139,7 +1139,7 @@ static int
> >>>> ff_layout_async_handle_error_v4(struct rpc_task *task,
> >>>>                 if (ff_layout_avoid_mds_available_ds(lseg))
> >>>>                         return -NFS4ERR_RESET_TO_PNFS;
> >>>> reset:
> >>>> -               dprintk("%s Retry through MDS. Error %d\n",
> >>>> __func__,
> >>>> +               pr_warn("%s Retry through MDS. Error %d\n",
> >>>> __func__,
> >>>>                         task->tk_status);
> >>>>                 return -NFS4ERR_RESET_TO_MDS;
> >>>>         }
> >>>> @@ -1167,7 +1167,7 @@ static int
> >>>> ff_layout_async_handle_error_v3(struct rpc_task *task,
> >>>>                 nfs_inc_stats(lseg->pls_layout->plh_inode,
> >>>> NFSIOS_DELAY);
> >>>>                 goto out_retry;
> >>>>         default:
> >>>> -               dprintk("%s DS connection error %d\n", __func__,
> >>>> +               pr_warn("%s DS connection error %d\n", __func__,
> >>>>                         task->tk_status);
> >>>>                 nfs4_delete_deviceid(devid->ld, devid->nfs_client,
> >>>>                                 &devid->deviceid);
> >>>> diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
> >>>> index 679767ac258d..322661a48348 100644
> >>>> --- a/fs/nfs/pnfs_nfs.c
> >>>> +++ b/fs/nfs/pnfs_nfs.c
> >>>> @@ -934,8 +934,11 @@ static int _nfs4_pnfs_v4_ds_connect(struct
> >>>> nfs_server *mds_srv,
> >>>>                                                 (struct sockaddr
> >>>> *)&da->da_addr,
> >>>>                                                 da->da_addrlen,
> >>>> IPPROTO_TCP,
> >>>>                                                 timeo, retrans,
> >>>> minor_version);
> >>>> -                       if (IS_ERR(clp))
> >>>> +                       if (IS_ERR(clp)) {
> >>>> +                               pr_warn("%s: DS: %s unable to
> >>>> connect with address %s, error: %ld\n",
> >>>> +                                       __func__, ds->ds_remotestr,
> >>>> da->da_remotestr, PTR_ERR(clp));
> >>>>                                 continue;
> >>>> +                       }
> >>>>
> >>>>                         status = nfs4_init_ds_session(clp,
> >>>>                                         mds_srv->nfs_client-
> >>>>> cl_lease_time);
> >>>> @@ -949,6 +952,7 @@ static int _nfs4_pnfs_v4_ds_connect(struct
> >>>> nfs_server *mds_srv,
> >>>>         }
> >>>>
> >>>>         if (IS_ERR(clp)) {
> >>>> +               pr_warn("%s: no DS available\n", __func__);
> >>>>                 status = PTR_ERR(clp);
> >>>>                 goto out;
> >>>>         }
> >>>> --
> >>>> 2.21.0
> >>>>
> >>>
> >>> --
> >>> Chuck Lever
> >>>
> >>>
> >>>
> >>
> >> --
> >> Trond Myklebust
> >> Linux NFS client maintainer, Hammerspace
> >> trond.myklebust@xxxxxxxxxxxxxxx
> >>
> >>
>




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux