Hi ..., I have a 2 node Pacemaker cluster built using CentOS 7.6.1810 It serves files using NFS and Samba. Every 15 - 20 minutes, the rpc.statd service fails, and the whole NFS service is restarted. After investigation, it was found that the service fails after a few rounds of monitoring by Pacemaker. The Pacemaker's script runs the following command to check whether all the services are running - --------------------------------------------------------------------------------------------------------------------------------------- rpcinfo > /dev/null 2>&1 rpcinfo -t localhost 100005 > /dev/null 2>&1 nfs_exec status nfs-idmapd > $fn 2>&1 rpcinfo -t localhost 100024 > /dev/null 2>&1 --------------------------------------------------------------------------------------------------------------------------------------- The script is scheduled to check every 20 seconds. This is the message we get in the logs - ------------------------------------------------------------------------------------------------------------------------------------- Jul 09 07:33:56 virat-nd01 rpc.mountd[51641]: check_default: access by 127.0.0.1 ALLOWED Jul 09 07:33:56 virat-nd01 rpc.mountd[51641]: Received NULL request from 127.0.0.1 Jul 09 07:33:56 virat-nd01 rpc.mountd[51641]: check_default: access by 127.0.0.1 ALLOWED (cached) Jul 09 07:33:56 virat-nd01 rpc.mountd[51641]: Received NULL request from 127.0.0.1 Jul 09 07:33:56 virat-nd01 rpc.mountd[51641]: check_default: access by 127.0.0.1 ALLOWED (cached) Jul 09 07:33:56 virat-nd01 rpc.mountd[51641]: Received NULL request from 127.0.0.1 ------------------------------------------------------------------------------------------------------------------------------------- After 10 seconds, we get his message - ------------------------------------------------------------------------------------------------------------------------------------- Jul 09 07:34:09 virat-nd01 nfsserver(virat-nfs-daemon)[54087]: ERROR: rpc-statd is not running ------------------------------------------------------------------------------------------------------------------------------------- Once we get this error, the NFS service is automatically restarted. "ERROR: rpc-statd is not running" message is from the pacemaker's monitoring script. I have pasted that part of the script below. I disabled monitoring and everything is working fine, since then. I cant keep the cluster monitoring disabled forever. Kindly help. Regards, Indivar Nair Part of the pacemaker script that does the monitoring (/usr/lib/ocf/resources.d/heartbeat/nfsserver) ======================================================================= nfsserver_systemd_monitor() { local threads_num local rc local fn ocf_log debug "Status: rpcbind" rpcinfo > /dev/null 2>&1 rc=$? if [ "$rc" -ne "0" ]; then ocf_exit_reason "rpcbind is not running" return $OCF_NOT_RUNNING fi ocf_log debug "Status: nfs-mountd" rpcinfo -t localhost 100005 > /dev/null 2>&1 rc=$? if [ "$rc" -ne "0" ]; then ocf_exit_reason "nfs-mountd is not running" return $OCF_NOT_RUNNING fi ocf_log debug "Status: nfs-idmapd" fn=`mktemp` nfs_exec status nfs-idmapd > $fn 2>&1 rc=$? ocf_log debug "$(cat $fn)" rm -f $fn if [ "$rc" -ne "0" ]; then ocf_exit_reason "nfs-idmapd is not running" return $OCF_NOT_RUNNING fi ocf_log debug "Status: rpc-statd" rpcinfo -t localhost 100024 > /dev/null 2>&1 rc=$? if [ "$rc" -ne "0" ]; then ocf_exit_reason "rpc-statd is not running" return $OCF_NOT_RUNNING fi nfs_exec is-active nfs-server rc=$? # Now systemctl is-active can't detect the failure of kernel process like nfsd. # So, if the return value of systemctl is-active is 0, check the threads number # to make sure the process is running really. # /proc/fs/nfsd/threads has the numbers of the nfsd threads. if [ $rc -eq 0 ]; then threads_num=`cat /proc/fs/nfsd/threads 2>/dev/null` if [ $? -eq 0 ]; then if [ $threads_num -gt 0 ]; then return $OCF_SUCCESS else return 3 fi else return $OCF_ERR_GENERIC fi fi return $rc } =======================================================================