We have been using a two-node distributed glusterfs cluster which is served with nfs-ganesha for a bit over two years. After a reboot of both nodes a few weeks ago, we started having trouble with nfs-ganesha repeatedly becoming unavailable. Within a few hours of being started via systemctl, the logs report that "Health status is unhealthy", followed by either a hang with an "rpc_clnt_ping_timer_expired" message (in which case systemctl won't restart it, and the process would need to be forcibly killed before restart) or a crash of the nfs-ganesha process. This happens as much as every few hours, less so during weekends when there is less usage, but we haven't been able to match the behavior with any particular I/O load or other specific activity. The system is somewhat out of date, so I'll apt upgrade pretty soon, but I was hoping to first get some tips on how I can otherwise troubleshoot the issue or even just minimize the symptoms. RAID6 checks look great on both nodes, the gluster peers and volume seem healthy -- sharing via glusterfs's native method does not appear to have this problem. CPU and memory usage are well within their limits. Network access looks fine. This pair of machines serves no function other than sharing files. Versions: glusterfs-server 3.13.2-1ubuntu1 nfs-ganesha 2.6.0-2 $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.4 LTS Release: 18.04 Codename: bionic $ uname -a Linux yuzz 4.15.0-88-generic #88-Ubuntu SMP Tue Feb 11 20:11:34 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux /etc/ganesha/ganesha.conf, sans empty/comment lines: EXPORT { Export_Id = 1729; Path = "/nfs"; Pseudo = "/nfs"; Access_Type = RW; Squash = No_Root_Squash; SecType = "sys"; FSAL { Name = "GLUSTER"; Hostname = localhost; Volume = "gv0"; Up_poll_usec = 10; # Upcall poll interval in microseconds Transport = tcp; # tcp or rdma } Contents of /var/log/ganesha/ganesha.log, with the last few "start...hang, start...hang" cycles shown: 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34940[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version 2.6.0 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[main] nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[main] init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper. 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized. 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[main] glusterfs_create_export :FSAL :EVENT :Volume gv0 exported at : '/' 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[main] lower_my_caps :NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully removed for proper quota management in FSAL 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[main] lower_my_caps :NFS STARTUP :EVENT :currenty set capabilities are: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[main] nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[main] nfs_Init_svc :DISP :CRIT :Cannot acquire credentials for principal nfs 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin thread initialized 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT :Callback creds directory (/var/run/ganesha) already exists 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN :gssd_refresh_krb5_machine_credential failed (-1765328160:0) 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[main] nfs_Start_threads :THREAD :EVENT :Starting delayed executor. 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[main] nfs_Start_threads :THREAD :EVENT :9P/TCP dispatcher thread was started successfully 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[main] nfs_Start_threads :THREAD :EVENT :gsh_dbusthread was started successfully 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P dispatcher started 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[main] nfs_Start_threads :THREAD :EVENT :admin thread was started successfully 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[main] nfs_Start_threads :THREAD :EVENT :reaper thread was started successfully 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[main] nfs_Start_threads :THREAD :EVENT :General fridge was started successfully 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[main] nfs_start :NFS STARTUP :EVENT :------------------------------------------------- 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[main] nfs_start :NFS STARTUP :EVENT : NFS SERVER INITIALIZED 20/09/2021 13:58:06 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[main] nfs_start :NFS STARTUP :EVENT :------------------------------------------------- 20/09/2021 13:59:36 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[reaper] nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE 20/09/2021 14:01:13 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 11338, old: 11330; deq new: 11254, old: 11254 20/09/2021 14:01:18 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 11345, old: 11338; deq new: 11254, old: 11254 20/09/2021 14:01:19 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 11347, old: 11345; deq new: 11254, old: 11254 20/09/2021 14:01:24 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 11354, old: 11347; deq new: 11254, old: 11254 20/09/2021 14:01:29 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 11369, old: 11358; deq new: 11257, old: 11257 20/09/2021 14:01:32 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 11370, old: 11369; deq new: 11257, old: 11257 20/09/2021 14:01:34 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 11378, old: 11370; deq new: 11257, old: 11257 20/09/2021 14:01:39 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 11387, old: 11378; deq new: 11257, old: 11257 20/09/2021 14:01:40 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 11388, old: 11387; deq new: 11257, old: 11257 20/09/2021 14:01:44 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 11397, old: 11388; deq new: 11257, old: 11257 20/09/2021 14:01:49 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 11407, old: 11397; deq new: 11257, old: 11257 20/09/2021 14:01:51 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 11409, old: 11407; deq new: 11257, old: 11257 20/09/2021 14:01:59 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 11481, old: 11470; deq new: 11308, old: 11308 20/09/2021 14:02:01 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 11483, old: 11481; deq new: 11308, old: 11308 20/09/2021 14:02:04 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 11498, old: 11491; deq new: 11316, old: 11316 20/09/2021 14:02:05 : epoch 6148cbae : yuzz : ganesha.nfsd-34948[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 11500, old: 11498; deq new: 11316, old: 11316 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35521[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version 2.6.0 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[main] nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[main] init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper. 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized. 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[main] glusterfs_create_export :FSAL :EVENT :Volume gv0 exported at : '/' 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[main] lower_my_caps :NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully removed for proper quota management in FSAL 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[main] lower_my_caps :NFS STARTUP :EVENT :currenty set capabilities are: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[main] nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[main] nfs_Init_svc :DISP :CRIT :Cannot acquire credentials for principal nfs 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin thread initialized 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT :Callback creds directory (/var/run/ganesha) already exists 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN :gssd_refresh_krb5_machine_credential failed (-1765328160:0) 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[main] nfs_Start_threads :THREAD :EVENT :Starting delayed executor. 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[main] nfs_Start_threads :THREAD :EVENT :9P/TCP dispatcher thread was started successfully 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[main] nfs_Start_threads :THREAD :EVENT :gsh_dbusthread was started successfully 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[main] nfs_Start_threads :THREAD :EVENT :admin thread was started successfully 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P dispatcher started 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[main] nfs_Start_threads :THREAD :EVENT :reaper thread was started successfully 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[main] nfs_Start_threads :THREAD :EVENT :General fridge was started successfully 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[main] nfs_start :NFS STARTUP :EVENT :------------------------------------------------- 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[main] nfs_start :NFS STARTUP :EVENT : NFS SERVER INITIALIZED 20/09/2021 14:02:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[main] nfs_start :NFS STARTUP :EVENT :------------------------------------------------- 20/09/2021 14:03:44 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[reaper] nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE 20/09/2021 21:27:39 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 976517, old: 976516; deq new: 976516, old: 976516 21/09/2021 03:43:13 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 1289488, old: 1289487; deq new: 1289487, old: 1289487 21/09/2021 09:31:00 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 2134266, old: 2134259; deq new: 2134161, old: 2134161 21/09/2021 09:31:09 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 2134270, old: 2134269; deq new: 2134164, old: 2134164 21/09/2021 09:31:14 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 2134271, old: 2134270; deq new: 2134164, old: 2134164 21/09/2021 09:31:16 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 2134273, old: 2134271; deq new: 2134164, old: 2134164 21/09/2021 09:31:36 : epoch 6148cca6 : yuzz : ganesha.nfsd-35529[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 2134275, old: 2134274; deq new: 2134165, old: 2134165 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11930[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version 2.6.0 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[main] nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[main] init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper. 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized. 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[main] glusterfs_create_export :FSAL :EVENT :Volume gv0 exported at : '/' 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[main] lower_my_caps :NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully removed for proper quota management in FSAL 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[main] lower_my_caps :NFS STARTUP :EVENT :currenty set capabilities are: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[main] nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[main] nfs_Init_svc :DISP :CRIT :Cannot acquire credentials for principal nfs 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin thread initialized 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT :Callback creds directory (/var/run/ganesha) already exists 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN :gssd_refresh_krb5_machine_credential failed (-1765328160:0) 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[main] nfs_Start_threads :THREAD :EVENT :Starting delayed executor. 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[main] nfs_Start_threads :THREAD :EVENT :9P/TCP dispatcher thread was started successfully 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P dispatcher started 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[main] nfs_Start_threads :THREAD :EVENT :gsh_dbusthread was started successfully 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[main] nfs_Start_threads :THREAD :EVENT :admin thread was started successfully 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[main] nfs_Start_threads :THREAD :EVENT :reaper thread was started successfully 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[main] nfs_Start_threads :THREAD :EVENT :General fridge was started successfully 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[main] nfs_start :NFS STARTUP :EVENT :------------------------------------------------- 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[main] nfs_start :NFS STARTUP :EVENT : NFS SERVER INITIALIZED 21/09/2021 09:31:50 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[main] nfs_start :NFS STARTUP :EVENT :------------------------------------------------- 21/09/2021 09:33:20 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[reaper] nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE 21/09/2021 11:04:14 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 458296, old: 458293; deq new: 458254, old: 458254 21/09/2021 11:04:17 : epoch 6149dec6 : yuzz : ganesha.nfsd-11937[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 458454, old: 458388; deq new: 458257, old: 458257 21/09/2021 11:04:59 : epoch 6149f49b : yuzz : ganesha.nfsd-14569[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version 2.6.0 ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users