Re: Strange segmentation violations of rpc.gssd in Debian Buster

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Thu, 25 Jun 2020 16:14:20 -0400

On Thu, Jun 25, 2020 at 05:43:53PM +0000, Kraus, Sebastian wrote:
> Dear Bruce,
> I got the following stack and back trace:
> 
> root@all:~# coredumpctl debug
>            PID: 6356 (rpc.gssd)
>            UID: 0 (root)
>            GID: 0 (root)
>         Signal: 11 (SEGV)
>      Timestamp: Thu 2020-06-25 11:46:08 CEST (3h 4min ago)
>   Command Line: /usr/sbin/rpc.gssd -vvvvvvv -rrrrrrr -t 3600 -T 10
>     Executable: /usr/sbin/rpc.gssd
>  Control Group: /system.slice/rpc-gssd.service
>           Unit: rpc-gssd.service
>          Slice: system.slice
>        Boot ID: XXXXXXXXXXXXXXXXXXXXXXXXXXX
>     Machine ID: YYYYYYYYYYYYYYYYYYYYYYYYYYYY
>       Hostname: XYZ
>        Storage: /var/lib/systemd/coredump/core.rpc\x2egssd.0.7f31136228274af0a1a855b91ad1e75c.6356.1593078368000000.lz4
>        Message: Process 6356 (rpc.gssd) of user 0 dumped core.
>                 
>                 Stack trace of thread 14174:
>                 #0  0x000056233fff038e n/a (rpc.gssd)
>                 #1  0x000056233fff09f8 n/a (rpc.gssd)
>                 #2  0x000056233fff0b92 n/a (rpc.gssd)
>                 #3  0x000056233fff13b3 n/a (rpc.gssd)
>                 #4  0x00007fb2eb8dbfa3 start_thread (libpthread.so.0)
>                 #5  0x00007fb2eb80c4cf __clone (libc.so.6)
>                 
>                 Stack trace of thread 6356:
>                 #0  0x00007fb2eb801819 __GI___poll (libc.so.6)
>                 #1  0x00007fb2eb6e7207 send_dg (libresolv.so.2)
>                 #2  0x00007fb2eb6e4c43 __GI___res_context_query (libresolv.so.2)
>                 #3  0x00007fb2eb6bf536 __GI__nss_dns_gethostbyaddr2_r (libnss_dns.so.2)
>                 #4  0x00007fb2eb6bf823 _nss_dns_gethostbyaddr_r (libnss_dns.so.2)
>                 #5  0x00007fb2eb81dee2 __gethostbyaddr_r (libc.so.6)
>                 #6  0x00007fb2eb8267d5 gni_host_inet_name (libc.so.6)
>                 #7  0x000056233ffef455 n/a (rpc.gssd)
>                 #8  0x000056233ffef82c n/a (rpc.gssd)
>                 #9  0x000056233fff01d0 n/a (rpc.gssd)
>                 #10 0x00007fb2ebab49ba n/a (libevent-2.1.so.6)
>                 #11 0x00007fb2ebab5537 event_base_loop (libevent-2.1.so.6)
>                 #12 0x000056233ffedeaa n/a (rpc.gssd)
>                 #13 0x00007fb2eb73709b __libc_start_main (libc.so.6)
>                 #14 0x000056233ffee03a n/a (rpc.gssd)
> 
> GNU gdb (Debian 8.2.1-2+b3) 8.2.1
> [...]
> Reading symbols from /usr/sbin/rpc.gssd...(no debugging symbols found)...done.
> [New LWP 14174]
> [New LWP 6356]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Core was generated by `/usr/sbin/rpc.gssd -vvvvvvv -rrrrrrr -t 3600 -T 10'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x000056233fff038e in ?? ()
> [Current thread is 1 (Thread 0x7fb2eaeba700 (LWP 14174))]
> (gdb) bt
> #0  0x000056233fff038e in ?? ()
> #1  0x000056233fff09f8 in ?? ()
> #2  0x000056233fff0b92 in ?? ()
> #3  0x000056233fff13b3 in ?? ()
> #4  0x00007fb2eb8dbfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
> #5  0x00007fb2eb80c4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> (gdb) quit
> 
> 
> I am not an expert in analyzing stack and backtraces. Is there anything meaningful, you are able to extract from the trace?
> As far as I see, thread 14174 caused the segmentation violation just after its birth on clone. 
> Please correct me, if I am in error.
> Seems Debian Buster does not ship any dedicated package with debug symbols for the rpc.gssd executable. 

Have you reported a debian bug?  They might know how to get a good trace
out of it.

--b.

> So far, I was not able to find such a package.
> What's your opinon about the trace?
> 
> 
> Best and Thanks
> Sebastian
> 
> _____________________________
> Sebastian Kraus
> Team IT am Institut für Chemie
> Gebäude C, Straße des 17. Juni 115, Raum C7
> 
> Technische Universität Berlin
> Fakultät II
> Institut für Chemie
> Sekretariat C3
> Straße des 17. Juni 135
> 10623 Berlin
> 
> Email: sebastian.kraus@xxxxxxxxxxxx
> 
> ________________________________________
> From: linux-nfs-owner@xxxxxxxxxxxxxxx <linux-nfs-owner@xxxxxxxxxxxxxxx> on behalf of J. Bruce Fields <bfields@xxxxxxxxxxxx>
> Sent: Tuesday, June 23, 2020 00:36
> To: Kraus, Sebastian
> Cc: linux-nfs@xxxxxxxxxxxxxxx
> Subject: Re: RPC Pipefs: Frequent parsing errors in client database
> 
> On Sat, Jun 20, 2020 at 09:08:55PM +0000, Kraus, Sebastian wrote:
> > Hi Bruce,
> >
> > >> But I think it'd be more useful to stay focused on the segfaults.
> >
> > is it a clever idea to analyze core dumps? Or are there other much better debugging techniques w.r.t. RPC daemons?
> 
> If we could at least get a backtrace out of the core dump that could be
> useful.
> 
> > I now do more tests while fiddling around with the time-out parameters "-T" and "-t" on the command line of rpc.gssd.
> >
> > There are several things I do not really understand about the trace shown below:
> >
> > 1) How can it be useful that the rpc.gssd daemon tries to parse the info file although it knows about its absence beforehand?
> 
> It doesn't know beforehand, in the scenarios I described.
> 
> > 2) Why are there two identifiers clnt36e and clnt36f being used for the same client?
> 
> This is actually happening on an NFS server, the rpc client in question
> is the callback client used to do things like send delegation recalls
> back to the NFS client.
> 
> I'm not sure why two different callback clients are being created here,
> but there's nothing inherently weird about that.
> 
> > 3) What does the <?> in "inotify event for clntdir (nfsd4_cb/clnt36e) - ev->wd (600) ev->name (<?>) ev->mask (0x00008000)" mean?
> 
> Off the top of my head, I don't know, we'd probably need to look through
> header files or inotify man pages for the definitions of those masks.
> 
> --b.