Log messages: Jan 21 17:07:31 node2 corosync[47788]: [TOTEM ] A processor failed, forming new configuration. Jan 21 17:07:43 node2 corosync[47788]: [QUORUM] Members[2]: 2 3 Jan 21 17:07:43 node2 corosync[47788]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Jan 21 17:07:43 node2 kernel: dlm: closing connection to node 1 Jan 21 17:07:43 node2 corosync[47788]: [CPG ] chosen downlist: sender r(0) ip(172........) ; members(old:3 left:1) Jan 21 17:07:43 node2 corosync[47788]: [MAIN ] Completed service synchronization, ready to provide service. Jan 21 17:07:43 node2 fenced[47840]: fencing node node1 Jan 21 17:07:43 node2 kernel: GFS2: fsid=Cluster:VMStorage1.1: jid=0: Trying to acquire journal lock... Jan 21 17:07:43 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage1.1: jid=0: Busy Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Looking at journal... Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage3.1: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage3.1: jid=0: Busy Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VM.1: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VM.1: jid=0: Busy Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Looking at journal... Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Acquiring the transaction lock... Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Replaying journal... Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Acquiring the transaction lock... Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Replaying journal... Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Replayed 250 of 515 blocks Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Found 12 revoke tags Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Journal replayed in 1s Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Done Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Replayed 4260 of 4803 blocks Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Found 5 revoke tags Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Journal replayed in 1s Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Done Jan 21 17:07:31 node3 corosync[51444]: [TOTEM ] A processor failed, forming new configuration. Jan 21 17:07:43 node3 corosync[51444]: [QUORUM] Members[2]: 2 3 Jan 21 17:07:43 node3 corosync[51444]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Jan 21 17:07:43 node3 corosync[51444]: [CPG ] chosen downlist: sender r(0) ip(172......) ; members(old:3 left:1) Jan 21 17:07:43 node3 corosync[51444]: [MAIN ] Completed service synchronization, ready to provide service. Jan 21 17:07:43 node3 kernel: dlm: closing connection to node 1 Jan 21 17:07:43 node3 fenced[51496]: fencing deferred to node2 Jan 21 17:07:43 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Trying to acquire journal lock... Jan 21 17:07:43 node3 kernel: GFS2: fsid=Cluster:VMStorage2.2: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage2.2: jid=0: Busy Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0: Looking at journal... Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Looking at journal... Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Looking at journal... Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0: Done Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage4.2: jid=0: Trying to acquire journal lock... Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage4.2: jid=0: Busy Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Acquiring the transaction lock... Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Replaying journal... Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Replayed 6 of 7 blocks Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Found 1 revoke tags Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Journal replayed in 1s Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Done Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Done One Question: Accessing to files before acquire journal lock may cause the problem? .... On Sat, Jan 31, 2015 at 9:10 AM, Digimer <lists@xxxxxxxxxx> wrote: > Does the logs show the fence succeeded or failed? Can you please post the > logs from the surviving two nodes starting just before the failure until a > few minutes after? > > digimer > > > On 31/01/15 12:10 AM, cluster lab wrote: >> >> Some more information: >> >> Cluster is a three nodes cluster, >> One of its node (ID == 1) fenced because of network failure ... >> >> After fence this problem borned ... >> >> >> On Sat, Jan 31, 2015 at 8:28 AM, cluster lab <cluster.labs@xxxxxxxxx> >> wrote: >>> >>> Hi, >>> >>> There is n't any unusual state or message, >>> Also GFS logs (gfs, dlm) are silent ... >>> >>> Is there any chance to find source of problem? >>> >>> On Thu, Jan 29, 2015 at 7:04 PM, Bob Peterson <rpeterso@xxxxxxxxxx> >>> wrote: >>>> >>>> ----- Original Message ----- >>>>> >>>>> On affected node: >>>>> >>>>> stat FILE | grep Inode >>>>> stat: cannot stat `FILE': Input/output error >>>>> >>>>> On other node: >>>>> stat PublicDNS1-OS.qcow2 | grep Inode >>>>> Device: fd06h/64774d Inode: 267858 Links: 1 >>>> >>>> >>>> Something funky going on. >>>> I'd check dmesg for withdraw messages, etc., on the affected node. >>>> >>>> Regards, >>>> >>>> Bob Peterson >>>> Red Hat File Systems >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster@xxxxxxxxxx >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ > What if the cure for cancer is trapped in the mind of a person without > access to education? > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster