On Thu, Jun 25, 2020 at 05:43:53PM +0000, Kraus, Sebastian wrote: > Dear Bruce, > I got the following stack and back trace: > > root@all:~# coredumpctl debug > PID: 6356 (rpc.gssd) > UID: 0 (root) > GID: 0 (root) > Signal: 11 (SEGV) > Timestamp: Thu 2020-06-25 11:46:08 CEST (3h 4min ago) > Command Line: /usr/sbin/rpc.gssd -vvvvvvv -rrrrrrr -t 3600 -T 10 > Executable: /usr/sbin/rpc.gssd > Control Group: /system.slice/rpc-gssd.service > Unit: rpc-gssd.service > Slice: system.slice > Boot ID: XXXXXXXXXXXXXXXXXXXXXXXXXXX > Machine ID: YYYYYYYYYYYYYYYYYYYYYYYYYYYY > Hostname: XYZ > Storage: /var/lib/systemd/coredump/core.rpc\x2egssd.0.7f31136228274af0a1a855b91ad1e75c.6356.1593078368000000.lz4 > Message: Process 6356 (rpc.gssd) of user 0 dumped core. > > Stack trace of thread 14174: > #0 0x000056233fff038e n/a (rpc.gssd) > #1 0x000056233fff09f8 n/a (rpc.gssd) > #2 0x000056233fff0b92 n/a (rpc.gssd) > #3 0x000056233fff13b3 n/a (rpc.gssd) > #4 0x00007fb2eb8dbfa3 start_thread (libpthread.so.0) > #5 0x00007fb2eb80c4cf __clone (libc.so.6) > > Stack trace of thread 6356: > #0 0x00007fb2eb801819 __GI___poll (libc.so.6) > #1 0x00007fb2eb6e7207 send_dg (libresolv.so.2) > #2 0x00007fb2eb6e4c43 __GI___res_context_query (libresolv.so.2) > #3 0x00007fb2eb6bf536 __GI__nss_dns_gethostbyaddr2_r (libnss_dns.so.2) > #4 0x00007fb2eb6bf823 _nss_dns_gethostbyaddr_r (libnss_dns.so.2) > #5 0x00007fb2eb81dee2 __gethostbyaddr_r (libc.so.6) > #6 0x00007fb2eb8267d5 gni_host_inet_name (libc.so.6) > #7 0x000056233ffef455 n/a (rpc.gssd) > #8 0x000056233ffef82c n/a (rpc.gssd) > #9 0x000056233fff01d0 n/a (rpc.gssd) > #10 0x00007fb2ebab49ba n/a (libevent-2.1.so.6) > #11 0x00007fb2ebab5537 event_base_loop (libevent-2.1.so.6) > #12 0x000056233ffedeaa n/a (rpc.gssd) > #13 0x00007fb2eb73709b __libc_start_main (libc.so.6) > #14 0x000056233ffee03a n/a (rpc.gssd) > > GNU gdb (Debian 8.2.1-2+b3) 8.2.1 > [...] > Reading symbols from /usr/sbin/rpc.gssd...(no debugging symbols found)...done. > [New LWP 14174] > [New LWP 6356] > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > Core was generated by `/usr/sbin/rpc.gssd -vvvvvvv -rrrrrrr -t 3600 -T 10'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x000056233fff038e in ?? () > [Current thread is 1 (Thread 0x7fb2eaeba700 (LWP 14174))] > (gdb) bt > #0 0x000056233fff038e in ?? () > #1 0x000056233fff09f8 in ?? () > #2 0x000056233fff0b92 in ?? () > #3 0x000056233fff13b3 in ?? () > #4 0x00007fb2eb8dbfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486 > #5 0x00007fb2eb80c4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > (gdb) quit > > > I am not an expert in analyzing stack and backtraces. Is there anything meaningful, you are able to extract from the trace? > As far as I see, thread 14174 caused the segmentation violation just after its birth on clone. > Please correct me, if I am in error. > Seems Debian Buster does not ship any dedicated package with debug symbols for the rpc.gssd executable. Have you reported a debian bug? They might know how to get a good trace out of it. --b. > So far, I was not able to find such a package. > What's your opinon about the trace? > > > Best and Thanks > Sebastian > > _____________________________ > Sebastian Kraus > Team IT am Institut für Chemie > Gebäude C, Straße des 17. Juni 115, Raum C7 > > Technische Universität Berlin > Fakultät II > Institut für Chemie > Sekretariat C3 > Straße des 17. Juni 135 > 10623 Berlin > > Email: sebastian.kraus@xxxxxxxxxxxx > > ________________________________________ > From: linux-nfs-owner@xxxxxxxxxxxxxxx <linux-nfs-owner@xxxxxxxxxxxxxxx> on behalf of J. Bruce Fields <bfields@xxxxxxxxxxxx> > Sent: Tuesday, June 23, 2020 00:36 > To: Kraus, Sebastian > Cc: linux-nfs@xxxxxxxxxxxxxxx > Subject: Re: RPC Pipefs: Frequent parsing errors in client database > > On Sat, Jun 20, 2020 at 09:08:55PM +0000, Kraus, Sebastian wrote: > > Hi Bruce, > > > > >> But I think it'd be more useful to stay focused on the segfaults. > > > > is it a clever idea to analyze core dumps? Or are there other much better debugging techniques w.r.t. RPC daemons? > > If we could at least get a backtrace out of the core dump that could be > useful. > > > I now do more tests while fiddling around with the time-out parameters "-T" and "-t" on the command line of rpc.gssd. > > > > There are several things I do not really understand about the trace shown below: > > > > 1) How can it be useful that the rpc.gssd daemon tries to parse the info file although it knows about its absence beforehand? > > It doesn't know beforehand, in the scenarios I described. > > > 2) Why are there two identifiers clnt36e and clnt36f being used for the same client? > > This is actually happening on an NFS server, the rpc client in question > is the callback client used to do things like send delegation recalls > back to the NFS client. > > I'm not sure why two different callback clients are being created here, > but there's nothing inherently weird about that. > > > 3) What does the <?> in "inotify event for clntdir (nfsd4_cb/clnt36e) - ev->wd (600) ev->name (<?>) ev->mask (0x00008000)" mean? > > Off the top of my head, I don't know, we'd probably need to look through > header files or inotify man pages for the definitions of those masks. > > --b.