Re: GFS2: "Could not open" the file on one of the nodes

cluster lab <cluster.labs@xxxxxxxxx> · Sat, 31 Jan 2015 10:21:30 +0330

Log messages:

Jan 21 17:07:31 node2 corosync[47788]:   [TOTEM ] A processor failed,
forming new configuration.
Jan 21 17:07:43 node2 corosync[47788]:   [QUORUM] Members[2]: 2 3
Jan 21 17:07:43 node2 corosync[47788]:   [TOTEM ] A processor joined
or left the membership and a new membership was formed.
Jan 21 17:07:43 node2 kernel: dlm: closing connection to node 1
Jan 21 17:07:43 node2 corosync[47788]:   [CPG   ] chosen downlist:
sender r(0) ip(172........) ; members(old:3 left:1)
Jan 21 17:07:43 node2 corosync[47788]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Jan 21 17:07:43 node2 fenced[47840]: fencing node node1
Jan 21 17:07:43 node2 kernel: GFS2: fsid=Cluster:VMStorage1.1: jid=0:
Trying to acquire journal lock...
Jan 21 17:07:43 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
Trying to acquire journal lock...
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage1.1: jid=0: Busy
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
Looking at journal...
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage3.1: jid=0:
Trying to acquire journal lock...
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
Trying to acquire journal lock...
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage3.1: jid=0: Busy
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VM.1: jid=0: Trying
to acquire journal lock...
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VM.1: jid=0: Busy
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
Looking at journal...
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
Acquiring the transaction lock...
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
Replaying journal...
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
Acquiring the transaction lock...
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
Replaying journal...
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
Replayed 250 of 515 blocks
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
Found 12 revoke tags
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0:
Journal replayed in 1s
Jan 21 17:07:57 node2 kernel: GFS2: fsid=Cluster:VMStorage2.1: jid=0: Done
Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
Replayed 4260 of 4803 blocks
Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
Found 5 revoke tags
Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0:
Journal replayed in 1s
Jan 21 17:07:58 node2 kernel: GFS2: fsid=Cluster:VMStorage4.1: jid=0: Done

Jan 21 17:07:31 node3 corosync[51444]:   [TOTEM ] A processor failed,
forming new configuration.
Jan 21 17:07:43 node3 corosync[51444]:   [QUORUM] Members[2]: 2 3
Jan 21 17:07:43 node3 corosync[51444]:   [TOTEM ] A processor joined
or left the membership and a new membership was formed.
Jan 21 17:07:43 node3 corosync[51444]:   [CPG   ] chosen downlist:
sender r(0) ip(172......) ; members(old:3 left:1)
Jan 21 17:07:43 node3 corosync[51444]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Jan 21 17:07:43 node3 kernel: dlm: closing connection to node 1
Jan 21 17:07:43 node3 fenced[51496]: fencing deferred to node2
Jan 21 17:07:43 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
Trying to acquire journal lock...
Jan 21 17:07:43 node3 kernel: GFS2: fsid=Cluster:VMStorage2.2: jid=0:
Trying to acquire journal lock...
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage2.2: jid=0: Busy
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Trying
to acquire journal lock...
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0:
Trying to acquire journal lock...
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0:
Looking at journal...
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
Looking at journal...
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Looking
at journal...
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage3.2: jid=0: Done
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage4.2: jid=0:
Trying to acquire journal lock...
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage4.2: jid=0: Busy
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
Acquiring the transaction lock...
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
Replaying journal...
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
Replayed 6 of 7 blocks
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
Found 1 revoke tags
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0:
Journal replayed in 1s
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VMStorage1.2: jid=0: Done
Jan 21 17:07:57 node3 kernel: GFS2: fsid=Cluster:VM.2: jid=0: Done

One Question: Accessing to files before acquire journal lock may cause
the problem?

....

On Sat, Jan 31, 2015 at 9:10 AM, Digimer <lists@xxxxxxxxxx> wrote:
> Does the logs show the fence succeeded or failed? Can you please post the
> logs from the surviving two nodes starting just before the failure until a
> few minutes after?
>
> digimer
>
>
> On 31/01/15 12:10 AM, cluster lab wrote:
>>
>> Some more information:
>>
>> Cluster is a three nodes cluster,
>> One of its node (ID == 1) fenced because of network failure ...
>>
>> After fence this problem borned ...
>>
>>
>> On Sat, Jan 31, 2015 at 8:28 AM, cluster lab <cluster.labs@xxxxxxxxx>
>> wrote:
>>>
>>> Hi,
>>>
>>> There is n't any unusual state or message,
>>> Also GFS logs (gfs, dlm) are silent ...
>>>
>>> Is there any chance to find source of problem?
>>>
>>> On Thu, Jan 29, 2015 at 7:04 PM, Bob Peterson <rpeterso@xxxxxxxxxx>
>>> wrote:
>>>>
>>>> ----- Original Message -----
>>>>>
>>>>> On affected node:
>>>>>
>>>>> stat FILE | grep Inode
>>>>> stat: cannot stat `FILE': Input/output error
>>>>>
>>>>> On other node:
>>>>> stat PublicDNS1-OS.qcow2 | grep Inode
>>>>> Device: fd06h/64774d    Inode: 267858      Links: 1
>>>>
>>>>
>>>> Something funky going on.
>>>> I'd check dmesg for withdraw messages, etc., on the affected node.
>>>>
>>>> Regards,
>>>>
>>>> Bob Peterson
>>>> Red Hat File Systems
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster@xxxxxxxxxx
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster