Re: RPC Pipefs: Frequent parsing errors in client database

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Bruce,
OK, here the somehow longer story 8-O:

I am maintaining virtualized several NFS server instances running on VMware ESXi hypervisor. The operating system is Debian Stretch/Buster.
Most of the time, the NFS servers are nearly idling and there is only moderate CPU load during rush hours. So, servers far from being overloaded.

Anyway, more than a year ago, the rpc.gssd daemon started getting unstable in production use.
The daemon provokes segmentations violations, serveral times a day and on an irregular basis. Unfortunately without any obvious reason. :-(

The observed violations look like this:
Jun 11 21:52:08 all kernel: rpc.gssd[12043]: segfault at 0 ip 000056065d50e38e sp 00007fde27ffe880 error 4 in rpc.gssd[56065d50b000+9000]
or that:
Mar 17 10:32:10 all kernel: rpc.gssd[25793]: segfault at ffffffffffffffc0 ip 00007ffa61f246e4 sp 00007ffa6145f0f8 error 5 in libc-2.24.so[7ffa61ea4000+195000]

In order to manage the problem in a quick and dirty way, I activated automatic restart of the rpc-gssd.service unit for "on-fail" reasons.


Several monthes ago, I decided to investigate the problem further by launching rpc.svcgssd and rpc.gssd daemons with enhanced debug level from their service units.
Sadly, this didn't help me to get any clue of the root cause of these strange segmentations violations.

Some of my colleagues urged me to migrate the server instances from Debian Stretch (current oldstable) to Debian Buster (current stable). 
They argued, rpc.gssd's crashes possibly being rooted in NFS stack instabilities. And about three weeks ago, I upgraded two of my server instances.
Unexpectedly, not only the problem did not disappear, but moreover frequency of the segmentation violations increased slightly.

Debian Stretch ships with nfs-common v1.3.4-2.1 and Buster with nfs-common v1.3.4-2.5 . So, both based the same nfs-common point release.


In consequence, about a week ago, I decided to investigate the problem in a deep manner by stracing the rpc.gssd daemon while running.
Since then, the segementation violations were gone, but now lots of complaints of the following type appear in the system log:

 Jun 19 11:14:00 all rpc.gssd[23620]: ERROR: can't open nfsd4_cb/clnt3bb/info: No such file or directory
 Jun 19 11:14:00 all rpc.gssd[23620]: ERROR: failed to parse nfsd4_cb/clnt3bb/info


This behaviour seems somehow strange to me.
But, one possible explanation could be: The execution speed of rpc.gssd slows down while being straced and the "true" reason for the segmentation violations pops up.
I would argue, rpc.gssd trying to parse non-existing files points anyway to an insane and defective behaviour of the RPC GSS user space daemon implementation.


Best and a nice weekend
Sebastian


Sebastian Kraus
Team IT am Institut für Chemie
Gebäude C, Straße des 17. Juni 115, Raum C7

Technische Universität Berlin
Fakultät II
Institut für Chemie
Sekretariat C3
Straße des 17. Juni 135
10623 Berlin

________________________________________
From: J. Bruce Fields <bfields@xxxxxxxxxxxx>
Sent: Saturday, June 20, 2020 00:04
To: Kraus, Sebastian
Cc: linux-nfs@xxxxxxxxxxxxxxx
Subject: Re: RPC Pipefs: Frequent parsing errors in client database

On Fri, Jun 19, 2020 at 09:24:27PM +0000, Kraus, Sebastian wrote:
> Hi all,
> since several weeks, I am seeing, on a regular basis, errors like the following in the system log of one of my NFSv4 file servers:
>
> Jun 19 11:14:00 all rpc.gssd[23620]: ERROR: can't open nfsd4_cb/clnt3bb/info: No such file or directory
> Jun 19 11:14:00 all rpc.gssd[23620]: ERROR: failed to parse nfsd4_cb/clnt3bb/info

I'm not sure what exactly is happening.

Are the log messages the only problem you're seeing, or is there some
other problem?

--b.

>
> Looks like premature closing of client connections.
> The security flavor of the NFS export is set to krb5p (integrity+privacy).
>
> Anyone a hint how to efficiently track down the problem?
>
>
> Best and thanks
> Sebastian
>
>
> Sebastian Kraus
> Team IT am Institut für Chemie
> Gebäude C, Straße des 17. Juni 115, Raum C7
>
> Technische Universität Berlin
> Fakultät II
> Institut für Chemie
> Sekretariat C3
> Straße des 17. Juni 135
> 10623 Berlin




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux