RE: Freeze with cluster-2.03.11

Kadlecsik Jozsef <kadlec@xxxxxxxxxxxx> · Fri, 27 Mar 2009 10:17:51 +0100 (CET)

On Fri, 27 Mar 2009, Kadlecsik Jozsef wrote:

> The failining node is fenced off. Here are the steps to reproduce the 
> freeze of the node:
> 
> - all nodes are running and member of the cluster
> - start the mailman queue manager: the node freezes
> - the freezed node fenced off by a member of the cluster
> - I can see log messages as I wrote in my first mail:
> 
> Mar 26 23:09:24 lxserv1 kernel: dlm: closing connection to node 1
> Mar 26 23:09:25 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: Trying to 
> acquire journal lock...
> [...]
> 
> - sometimes (but not always) the fencing machine freezes as well
>   and then therefore fenced off
> - third node has never freezed so far and the cluster thus remained
>   in quorum
> - fenced off machines restarted, join the cluster and work until I start
>   the mailman queue manager
> 
> The daily backups of the whole GFS file systems are completed, so I assume 
> it's not a filesystem corruption.

Trying to start mailman on another node causes instant freezing on that 
node as well, so it's node-independent. When the node freezes, the kernel 
log shows at the other nodes:

Mar 27 08:58:38 lxserv1 kernel: dlm: closing connection to node 3
Mar 27 08:58:38 lxserv1 kernel: GFS: fsid=kfki:home.3: jid=0: Trying to acquire journal lock...
Mar 27 08:58:38 lxserv1 kernel: GFS: fsid=kfki:services.3: jid=0: Trying to acquire journal lock...
Mar 27 08:58:39 lxserv1 kernel: GFS: fsid=kfki:home.3: jid=0: Busy
Mar 27 08:58:39 lxserv1 kernel: GFS: fsid=kfki:services.3: jid=0: Busy

except one node where the journal replayed:

Mar 27 08:58:38 lxserv0 kernel: dlm: closing connection to node 3
Mar 27 08:58:38 lxserv0 kernel: GFS: fsid=kfki:services.1: jid=0: Trying to acquire journal lock...
Mar 27 08:58:38 lxserv0 kernel: GFS: fsid=kfki:home.1: jid=0: Trying to acquire journal lock...
Mar 27 08:58:39 lxserv0 kernel: GFS: fsid=kfki:home.1: jid=0: Busy
Mar 27 08:58:39 lxserv0 kernel: GFS: fsid=kfki:services.1: jid=0: Looking at journal...
Mar 27 08:58:39 lxserv0 kernel: GFS: fsid=kfki:services.1: jid=0: Acquiring the transaction lock...
Mar 27 08:58:39 lxserv0 kernel: GFS: fsid=kfki:services.1: jid=0: Replaying journal...
Mar 27 08:58:39 lxserv0 kernel: GFS: fsid=kfki:services.1: jid=0: Replayed 342 of 380 blocks
Mar 27 08:58:39 lxserv0 kernel: GFS: fsid=kfki:services.1: jid=0: replays = 342, skips = 23, sames = 15
Mar 27 08:58:39 lxserv0 kernel: GFS: fsid=kfki:services.1: jid=0: Journal replayed in 1s
Mar 27 08:58:39 lxserv0 kernel: GFS: fsid=kfki:services.1: jid=0: Done

The cman debug log shows three different patterns (it is unfortunate that 
openais does not timestamp the debug log), like these

node2:
+[TOTEM] entering GATHER state from 12.
+[TOTEM] entering GATHER state from 0.
+[TOTEM] Saving state aru 284b high seq received 284b
+[TOTEM] Storing new sequence id for ring 3064
+[TOTEM] entering COMMIT state.
+[TOTEM] entering RECOVERY state.
+[TOTEM] position [0] member 192.168.192.6:
+[TOTEM] previous ring seq 12384 rep 192.168.192.6
+[TOTEM] aru 284b high delivered 284b received flag 1
+[TOTEM] position [1] member 192.168.192.7:
+[TOTEM] previous ring seq 12384 rep 192.168.192.6
+[TOTEM] aru 284b high delivered 284b received flag 1
+[TOTEM] position [2] member 192.168.192.15:
+[TOTEM] previous ring seq 12384 rep 192.168.192.6
+[TOTEM] aru 284b high delivered 284b received flag 1
+[TOTEM] position [3] member 192.168.192.18:
+[TOTEM] previous ring seq 12384 rep 192.168.192.6
+[TOTEM] aru 284b high delivered 284b received flag 1
+[TOTEM] Did not need to originate any messages in recovery.
+[CLM  ] CLM CONFIGURATION CHANGE
+[CLM  ] New Configuration:
+[CLM  ]        r(0) ip(192.168.192.6) 
+[CLM  ]        r(0) ip(192.168.192.7) 
+[CLM  ]        r(0) ip(192.168.192.15) 
+[CLM  ]        r(0) ip(192.168.192.18) 
+[CLM  ] Members Left:
+[CLM  ]        r(0) ip(192.168.192.17) 

node4:
+[TOTEM] entering GATHER state from 12.
+[TOTEM] entering GATHER state from 0.
+[TOTEM] Creating commit token because I am the rep.
+[TOTEM] Saving state aru 284b high seq received 284b
+[TOTEM] Storing new sequence id for ring 3064
+[TOTEM] entering COMMIT state.
+[TOTEM] entering RECOVERY state.
+[TOTEM] position [0] member 192.168.192.6:
+[TOTEM] previous ring seq 12384 rep 192.168.192.6
+[TOTEM] aru 284b high delivered 284b received flag 1
+[TOTEM] position [1] member 192.168.192.7:
+[TOTEM] previous ring seq 12384 rep 192.168.192.6
+[TOTEM] aru 284b high delivered 284b received flag 1
+[TOTEM] position [2] member 192.168.192.15:
+[TOTEM] previous ring seq 12384 rep 192.168.192.6
+[TOTEM] aru 284b high delivered 284b received flag 1
+[TOTEM] position [3] member 192.168.192.18:
+[TOTEM] previous ring seq 12384 rep 192.168.192.6
+[TOTEM] aru 284b high delivered 284b received flag 1
+[TOTEM] Did not need to originate any messages in recovery.
+[TOTEM] Sending initial ORF token
+[CLM  ] CLM CONFIGURATION CHANGE
+[CLM  ] New Configuration:
+[CLM  ]        r(0) ip(192.168.192.6) 
+[CLM  ]        r(0) ip(192.168.192.7) 
+[CLM  ]        r(0) ip(192.168.192.15) 
+[CLM  ]        r(0) ip(192.168.192.18) 
+[CLM  ] Members Left:
+[CLM  ]        r(0) ip(192.168.192.17) 

node5:
+[TOTEM] The token was lost in the OPERATIONAL state.
+[TOTEM] Receive multicast socket recv buffer size (288000 bytes).
+[TOTEM] Transmit multicast socket send buffer size (288000 bytes).
+[TOTEM] entering GATHER state from 2.
+[TOTEM] entering GATHER state from 0.
+[TOTEM] Saving state aru 284b high seq received 284b
+[TOTEM] Storing new sequence id for ring 3064
+[TOTEM] entering COMMIT state.
+[TOTEM] entering RECOVERY state.
+[TOTEM] position [0] member 192.168.192.6:
+[TOTEM] previous ring seq 12384 rep 192.168.192.6
+[TOTEM] aru 284b high delivered 284b received flag 1
+[TOTEM] position [1] member 192.168.192.7:
+[TOTEM] previous ring seq 12384 rep 192.168.192.6
+[TOTEM] aru 284b high delivered 284b received flag 1
+[TOTEM] position [2] member 192.168.192.15:
+[TOTEM] previous ring seq 12384 rep 192.168.192.6
+[TOTEM] aru 284b high delivered 284b received flag 1
+[TOTEM] position [3] member 192.168.192.18:
+[TOTEM] previous ring seq 12384 rep 192.168.192.6
+[TOTEM] aru 284b high delivered 284b received flag 1
+[TOTEM] Did not need to originate any messages in recovery.
+[CLM  ] CLM CONFIGURATION CHANGE
+[CLM  ] New Configuration:
+[CLM  ]        r(0) ip(192.168.192.6) 
+[CLM  ]        r(0) ip(192.168.192.7) 
+[CLM  ]        r(0) ip(192.168.192.15) 
+[CLM  ]        r(0) ip(192.168.192.18) 
+[CLM  ] Members Left:
+[CLM  ]        r(0) ip(192.168.192.17) 

On the victim machine I started

# strace -f -o strace.log /etc/init.d/mailman start

and after the freeze hit Alt+SysReq+s, but it does not guarantee that the 
whole log file was synced. Newertheless, it seems mailman went through all 
the mailing list config files and entered the listening state before the 
freeze (/var/lib/mailman/locks is a symbolic link pointing to a GFS 
directory):

[...]
14215 write(4, "/var/lib/mailman/locks/master-qr"..., 50) = 50
14215 close(4)                          = 0
14215 munmap(0xb7bdb000, 4096)          = 0
14215 umask(022)                        = 02
14215 gettimeofday({1238140699, 692207}, NULL) = 0
14215 utimes("/var/lib/mailman/locks/master-qrunner.web0.14215.1", {1238248699, 692207}) = 0
14215 link("/var/lib/mailman/locks/master-qrunner.web0.14215.1", "/var/lib/mailman/locks/master-qrunner") = -1 EEXIST (File exists)
14215 stat64("/var/lib/mailman/locks/master-qrunner", {st_mode=S_IFREG|0664, st_size=50, ...}) = 0
14215 open("/var/lib/mailman/locks/master-qrunner", O_RDONLY|O_LARGEFILE) = 4
14215 fstat64(4, {st_mode=S_IFREG|0664, st_size=50, ...}) = 0
14215 fstat64(4, {st_mode=S_IFREG|0664, st_size=50, ...}) = 0
14215 _llseek(4, 0, [0], SEEK_CUR)      = 0
14215 fstat64(4, {st_mode=S_IFREG|0664, st_size=50, ...}) = 0
14215 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7bdb000
14215 _llseek(4, 0, [0], SEEK_CUR)      = 0
14215 read(4, "/var/lib/mailman/locks/master-qr"..., 4096) = 50
14215 read(4, "", 4096)                 = 0
14215 close(4)                          = 0
14215 munmap(0xb7bdb000, 4096)          = 0
14215 gettimeofday({1238140699, 693608}, NULL) = 0
14215 gettimeofday({1238140699, 693644}, NULL) = 0
14215 stat64("/var/lib/mailman/locks/master-qrunner", {st_mode=S_IFREG|0664, st_size=50, ...}) = 0
14215 select(0, NULL, NULL, NULL, {0, 668025}

I ran 'gfs_tool counters /gfs/services' in a one second sleep loop on all 
nodes, but I could not discover anything interesting there. The last entry 
on the failed node is:

Fri Mar 27 08:58:18 CET 2009

                                  locks 4630
                             locks held 1308
                           freeze count 0
                          incore inodes 269
                       metadata buffers 4
                        unlinked inodes 0
                              quota IDs 0
                     incore log buffers 0
                         log space used 0.15%
              meta header cache entries 23
                     glock dependencies 1
                 glocks on reclaim list 0
                              log wraps 1
                   outstanding LM calls 0
                  outstanding BIO calls 0
                       fh2dentry misses 0
                       glocks reclaimed 1315287
                         glock nq calls 25160827
                         glock dq calls 25160295
                   glock prefetch calls 1201
                          lm_lock calls 340274
                        lm_unlock calls 335838
                           lm callbacks 677487
                     address operations 680976
                      dentry operations 9891829
                      export operations 0
                        file operations 843043
                       inode operations 13932689
                       super operations 746420
                          vm operations 338069
                        block I/O reads 0
                       block I/O writes 0

In an attempt to trigger the freeze without mailman (if it is due to 
a corrupt fs) I ran

find . -type f | while read x; do
	echo $x
	cat $x > /dev/null
done

in the mailman root directory but it produced nothing on the node itself I 
ran the commands, but another node frozen at that time.

I'd be glad for any suggestion to solve the node freeze.

Best regards,
Jozsef
--
E-mail : kadlec@xxxxxxxxxxxx, kadlec@xxxxxxxxxxxxxxxxx
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
         H-1525 Budapest 114, POB. 49, Hungary

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster