Logs as attach ... On Sat, Jan 31, 2015 at 10:21 AM, cluster lab <cluster.labs@xxxxxxxxx> wrote: > Log messages: > > Jan 21 17:07:31 node2 corosync[47788]: [TOTEM ] A processor failed, > forming new configuration. > Jan 21 17:07:43 node2 corosync[47788]: [QUORUM] Members[2]: 2 3 > Jan 21 17:07:43 node2 corosync[47788]: [TOTEM ] A processor joined > or left the membership and a new membership was formed. > Jan 21 17:07:43 node2 kernel: dlm: closing connection to node 1 > Jan 21 17:07:43 node2 corosync[47788]: [CPG ] chosen downlist: > sender r(0) ip(172........) ; members(old:3 left:1) > Jan 21 17:07:43 node2 corosync[47788]: [MAIN ] Completed service > synchronization, ready to provide service. > Jan 21 17:07:43 node2 fenced[47840]: fencing node node1 > Jan 21 17:07:43 node2 kernel: GFS2: fsid=Cluster:VMStorage1.1: jid=0: > Trying to acquire journal lock... > Jan 21 17:07:43 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: > Trying to acquire journal lock... > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage1.1: jid=0: Busy > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: > Looking at journal... > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage3.1: jid=0: > Trying to acquire journal lock... > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: > Trying to acquire journal lock... > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage3.1: jid=0: Busy > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VM.1: jid=0: Trying > to acquire journal lock... > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VM.1: jid=0: Busy > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: > Looking at journal... > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: > Acquiring the transaction lock... > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: > Replaying journal... > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: > Acquiring the transaction lock... > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: > Replaying journal... > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: > Replayed 250 of 515 blocks > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: > Found 12 revoke tags > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: > Journal replayed in 1s > Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Done > Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: > Replayed 4260 of 4803 blocks > Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: > Found 5 revoke tags > Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: > Journal replayed in 1s > Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Done > > > Jan 21 17:07:31 node3 corosync[51444]: [TOTEM ] A processor failed, > forming new configuration. > Jan 21 17:07:43 node3 corosync[51444]: [QUORUM] Members[2]: 2 3 > Jan 21 17:07:43 node3 corosync[51444]: [TOTEM ] A processor joined > or left the membership and a new membership was formed. > Jan 21 17:07:43 node3 corosync[51444]: [CPG ] chosen downlist: > sender r(0) ip(172......) ; members(old:3 left:1) > Jan 21 17:07:43 node3 corosync[51444]: [MAIN ] Completed service > synchronization, ready to provide service. > Jan 21 17:07:43 node3 kernel: dlm: closing connection to node 1 > Jan 21 17:07:43 node3 fenced[51496]: fencing deferred to node2 > Jan 21 17:07:43 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: > Trying to acquire journal lock... > Jan 21 17:07:43 node3 kernel: GFS2: fsid=Cluster:VMStorage2.2: jid=0: > Trying to acquire journal lock... > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage2.2: jid=0: Busy > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Trying > to acquire journal lock... > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0: > Trying to acquire journal lock... > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0: > Looking at journal... > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: > Looking at journal... > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Looking > at journal... > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0: Done > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage4.2: jid=0: > Trying to acquire journal lock... > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage4.2: jid=0: Busy > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: > Acquiring the transaction lock... > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: > Replaying journal... > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: > Replayed 6 of 7 blocks > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: > Found 1 revoke tags > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: > Journal replayed in 1s > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Done > Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Done > > One Question: Accessing to files before acquire journal lock may cause > the problem? > > .... > > On Sat, Jan 31, 2015 at 9:10 AM, Digimer <lists@xxxxxxxxxx> wrote: >> Does the logs show the fence succeeded or failed? Can you please post the >> logs from the surviving two nodes starting just before the failure until a >> few minutes after? >> >> digimer >> >> >> On 31/01/15 12:10 AM, cluster lab wrote: >>> >>> Some more information: >>> >>> Cluster is a three nodes cluster, >>> One of its node (ID == 1) fenced because of network failure ... >>> >>> After fence this problem borned ... >>> >>> >>> On Sat, Jan 31, 2015 at 8:28 AM, cluster lab <cluster.labs@xxxxxxxxx> >>> wrote: >>>> >>>> Hi, >>>> >>>> There is n't any unusual state or message, >>>> Also GFS logs (gfs, dlm) are silent ... >>>> >>>> Is there any chance to find source of problem? >>>> >>>> On Thu, Jan 29, 2015 at 7:04 PM, Bob Peterson <rpeterso@xxxxxxxxxx> >>>> wrote: >>>>> >>>>> ----- Original Message ----- >>>>>> >>>>>> On affected node: >>>>>> >>>>>> stat FILE | grep Inode >>>>>> stat: cannot stat `FILE': Input/output error >>>>>> >>>>>> On other node: >>>>>> stat PublicDNS1-OS.qcow2 | grep Inode >>>>>> Device: fd06h/64774d Inode: 267858 Links: 1 >>>>> >>>>> >>>>> Something funky going on. >>>>> I'd check dmesg for withdraw messages, etc., on the affected node. >>>>> >>>>> Regards, >>>>> >>>>> Bob Peterson >>>>> Red Hat File Systems >>>>> >>>>> -- >>>>> Linux-cluster mailing list >>>>> Linux-cluster@xxxxxxxxxx >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >> >> >> -- >> Digimer >> Papers and Projects: https://alteeve.ca/w/ >> What if the cure for cancer is trapped in the mind of a person without >> access to education? >> >> -- >> Linux-cluster mailing list >> Linux-cluster@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/linux-cluster
Jan 21 17:07:43 ost-pvm2 corosync[47788]: [QUORUM] Members[2]: 2 3 Jan 21 17:07:43 ost-pvm2 corosync[47788]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Jan 21 17:07:43 ost-pvm2 kernel: dlm: closing connection to node 1 Jan 21 17:07:43 ost-pvm2 corosync[47788]: [CPG ] chosen downlist: sender r(0) ip(172.16.40.22) ; members(old:3 left:1) Jan 21 17:07:43 ost-pvm2 corosync[47788]: [MAIN ] Completed service synchronization, ready to provide service. Jan 21 17:07:43 ost-pvm2 fenced[47840]: fencing node ost-pvm1 Jan 21 17:07:43 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage1.1: jid=0: Trying to acquire journal lock... Jan 21 17:07:43 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage1.1: jid=0: Busy Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Looking at journal... Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage3.1: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage3.1: jid=0: Busy Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:PVM.1: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:PVM.1: jid=0: Busy Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Looking at journal... Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Acquiring the transaction lock... Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Replaying journal... Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Acquiring the transaction lock... Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Replaying journal... Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Replayed 250 of 515 blocks Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Found 12 revoke tags Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Journal replayed in 1s Jan 21 17:07:57 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage2.1: jid=0: Done Jan 21 17:07:58 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Replayed 4260 of 4803 blocks Jan 21 17:07:58 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Found 5 revoke tags Jan 21 17:07:58 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Journal replayed in 1s Jan 21 17:07:58 ost-pvm2 kernel: GFS2: fsid=YazdCluster:VMStorage4.1: jid=0: Done
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster