Re: Linux-cluster Digest, Vol 86, Issue 19

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dominic,

Yes the errors are only belongs to passive path.
------------------------------

Message: 3
Date: Tue, 21 Jun 2011 18:22:49 +0530
From: dOminic <share2dom@xxxxxxxxx>
To: linux clustering <linux-cluster@xxxxxxxxxx>
Subject: Re: Cluster Failover Failed
Message-ID: <BANLkTi=bAtD8BYp4_T5ksir=dRSAO2dq9Q@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="iso-8859-1"

Hi,

Btw, how many HBAs are present in your box ? . Problem is with scsi3 only ?.

Refer https://access.redhat.com/kb/docs/DOC-2991 , then set the filter.
Also, I would suggest you to open ticket with Linux vendor if IO errors are
belongs to Active paths.

Pointed IO errors are belongs to disk that in passive paths group ?. you can
verify the same in multipath-ll output .

regards,

On Sun, Jun 19, 2011 at 10:03 PM, dOminic <share2dom@xxxxxxxxx> wrote:

> Hi Balaji,
>
> Yes, the reported message is harmless ... However, you can try following
>
> 1) I would suggest you to set the filter setting in lvm.conf to properly
> scan your mpath* devices and local disks.
> 2) Enable blacklist section in multipath.conf  eg:
>
> blacklist {
>        devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
>        devnode "^hd[a-z]"
> }
>
> # multipath -v2
>
> Observe the box. Check whether that helps ...
>
>
> Regards,
>
>
> On Wed, Jun 15, 2011 at 12:16 AM, Balaji S <skjbalaji@xxxxxxxxx> wrote:
>
>> Hi,
>> In my setup implemented 10 tow node cluster's which running mysql as
>> cluster service, ipmi card as fencing device.
>>
>> In my /var/log/messages i am keep getting the errors like below,
>>
>> Jun 14 12:50:48 hostname kernel: end_request: I/O error, dev sdm, sector 0
>> Jun 14 12:50:48 hostname kernel: sd 3:0:2:2: Device not ready: <6>:
>> Current: sense key: Not Ready
>> Jun 14 12:50:48 hostname kernel:     Add. Sense: Logical unit not ready,
>> manual intervention required
>> Jun 14 12:50:48 hostname kernel:
>> Jun 14 12:50:48 hostname kernel: end_request: I/O error, dev sdn, sector 0
>> Jun 14 12:50:48 hostname kernel: sd 3:0:2:4: Device not ready: <6>:
>> Current: sense key: Not Ready
>> Jun 14 12:50:48 hostname kernel:     Add. Sense: Logical unit not ready,
>> manual intervention required
>> Jun 14 12:50:48 hostname kernel:
>> Jun 14 12:50:48 hostname kernel: end_request: I/O error, dev sdp, sector 0
>> Jun 14 12:51:10 hostname kernel: sd 3:0:0:1: Device not ready: <6>:
>> Current: sense key: Not Ready
>> Jun 14 12:51:10 hostname kernel:     Add. Sense: Logical unit not ready,
>> manual intervention required
>> Jun 14 12:51:10 hostname kernel:
>> Jun 14 12:51:10 hostname kernel: end_request: I/O error, dev sdc, sector 0
>> Jun 14 12:51:10 hostname kernel: printk: 3 messages suppressed.
>> Jun 14 12:51:10 hostname kernel: Buffer I/O error on device sdc, logical
>> block 0
>> Jun 14 12:51:10 hostname kernel: sd 3:0:0:2: Device not ready: <6>:
>> Current: sense key: Not Ready
>> Jun 14 12:51:10 hostname kernel:     Add. Sense: Logical unit not ready,
>> manual intervention required
>> Jun 14 12:51:10 hostname kernel:
>> Jun 14 12:51:10 hostname kernel: end_request: I/O error, dev sdd, sector 0
>> Jun 14 12:51:10 hostname kernel: Buffer I/O error on device sdd, logical
>> block 0
>> Jun 14 12:51:10 hostname kernel: sd 3:0:0:4: Device not ready: <6>:
>> Current: sense key: Not Ready
>> Jun 14 12:51:10 hostname kernel:     Add. Sense: Logical unit not ready,
>> manual intervention required
>>
>>
>> when i am checking the multipath -ll , this all devices are in passive
>> path.
>>
>> Environment :
>>
>> RHEL 5.4 & EMC SAN
>>
>> Please suggest how to overcome this issue. Support will be highly helpful.
>> Thanks in Advance
>>
>>
>> --
>> Thanks,
>> BSK
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster@xxxxxxxxxx
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.redhat.com/archives/linux-cluster/attachments/20110621/e41e841c/attachment.html>

------------------------------

Message: 4
Date: Tue, 21 Jun 2011 15:31:13 +0200
From: Miha Valencic <miha.valencic@xxxxxxxxx>
To: linux clustering <linux-cluster@xxxxxxxxxx>
Subject: Re: Troubleshooting service relocation
Message-ID: <BANLkTi=eT93Bv3qeO0+t+EzZP=6yDYaV1Q@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="utf-8"

Michael, I've configured the logging on RM and am now waiting for it to
switch nodes. Hopefully, I can see a reason why it is relocating.

Thanks,
 Miha.

On Sat, Jun 18, 2011 at 11:24 AM, Michael Pye <michael@xxxxxxxxxx> wrote:

> On 17/06/2011 09:13, Miha Valencic wrote:
> > How can I turn on logging or what else can I check?
>
> Take a look at this knowledgebase article:
> https://access.redhat.com/kb/docs/DOC-53500
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.redhat.com/archives/linux-cluster/attachments/20110621/19a643fd/attachment.html>

------------------------------

Message: 5
Date: Tue, 21 Jun 2011 09:57:38 -0400
From: "Nicolas Ross" <rossnick-lists@xxxxxxxxxxx>
To: "linux clustering" <linux-cluster@xxxxxxxxxx>
Subject: GFS2 fatal: filesystem consistency error
Message-ID: <AD364AF1E9D94C50B96231FB0320B1DE@versa>
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
       reply-type=original

8 node cluster, fiber channel hbas and disks access trough a qlogic fabric.

I've got hit 3 times with this error on different nodes :

GFS2: fsid=CyberCluster:GizServer.1: fatal: filesystem consistency error
GFS2: fsid=CyberCluster:GizServer.1: inode = 9582 6698267
GFS2: fsid=CyberCluster:GizServer.1: function = gfs2_dinode_dealloc, file =
fs/gfs2/inode.c, line = 352
GFS2: fsid=CyberCluster:GizServer.1: about to withdraw this file system
GFS2: fsid=CyberCluster:GizServer.1: telling LM to unmount
GFS2: fsid=CyberCluster:GizServer.1: withdrawn
Pid: 2659, comm: delete_workqueu Tainted: G W ---------------- T
2.6.32-131.2.1.el6.x86_64 #1
Call Trace:
[<ffffffffa044ffd2>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2]
[<ffffffffa0425209>] ? trunc_dealloc+0xa9/0x130 [gfs2]
[<ffffffffa04501dd>] ? gfs2_consist_inode_i+0x5d/0x60 [gfs2]
[<ffffffffa0435584>] ? gfs2_dinode_dealloc+0x64/0x210 [gfs2]
[<ffffffffa044e1da>] ? gfs2_delete_inode+0x1ba/0x280 [gfs2]
[<ffffffffa044e0ad>] ? gfs2_delete_inode+0x8d/0x280 [gfs2]
[<ffffffffa044e020>] ? gfs2_delete_inode+0x0/0x280 [gfs2]
[<ffffffff8118cfbe>] ? generic_delete_inode+0xde/0x1d0
[<ffffffffa0432940>] ? delete_work_func+0x0/0x80 [gfs2]
[<ffffffff8118d115>] ? generic_drop_inode+0x65/0x80
[<ffffffffa044cc4e>] ? gfs2_drop_inode+0x2e/0x30 [gfs2]
[<ffffffff8118bf82>] ? iput+0x62/0x70
[<ffffffffa0432994>] ? delete_work_func+0x54/0x80 [gfs2]
[<ffffffff810887d0>] ? worker_thread+0x170/0x2a0
[<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40
[<ffffffff81088660>] ? worker_thread+0x0/0x2a0
[<ffffffff8108dd96>] ? kthread+0x96/0xa0
[<ffffffff8100c1ca>] ? child_rip+0xa/0x20
[<ffffffff8108dd00>] ? kthread+0x0/0xa0
[<ffffffff8100c1c0>] ? child_rip+0x0/0x20
no_formal_ino = 9582
no_addr = 6698267
i_disksize = 6838
blocks = 0
i_goal = 6698304
i_diskflags = 0x00000000
i_height = 1
i_depth = 0
i_entries = 0
i_eattr = 0
GFS2: fsid=CyberCluster:GizServer.1: gfs2_delete_inode: -5
gdlm_unlock 5,66351b err=-22


Only, with different inodes each time.

After that event, services running on that filesystem are marked failed and
not moved over another node. Any access to that fs yields I/O error. Server
needed to be rebooted to properly work again.

I did ran a fsck last night on that filesystem, and it did find some errors,
but nothing serious. Lots (realy lots) of those :

Ondisk and fsck bitmaps differ at block 5771602 (0x581152)
Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
Metadata type is 0 (free)
Fix bitmap for block 5771602 (0x581152) ? (y/n)

And after completing the fsck, I started back some services, and I got the
same error on another filesystem that is practily empty and used for small
utilities used troughout the cluster...

What should I do to find the source of this problem ?



------------------------------

Message: 6
Date: Tue, 21 Jun 2011 10:42:40 -0400 (EDT)
From: Bob Peterson <rpeterso@xxxxxxxxxx>
To: linux clustering <linux-cluster@xxxxxxxxxx>
Subject: Re: GFS2 fatal: filesystem consistency error
Message-ID:
       <1036238479.689034.1308667360488.JavaMail.root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>

Content-Type: text/plain; charset=utf-8

----- Original Message -----
| 8 node cluster, fiber channel hbas and disks access trough a qlogic
| fabric.
|
| I've got hit 3 times with this error on different nodes :
|
| GFS2: fsid=CyberCluster:GizServer.1: fatal: filesystem consistency
| error
| GFS2: fsid=CyberCluster:GizServer.1: inode = 9582 6698267
| GFS2: fsid=CyberCluster:GizServer.1: function = gfs2_dinode_dealloc,
| file =
| fs/gfs2/inode.c, line = 352
| GFS2: fsid=CyberCluster:GizServer.1: about to withdraw this file
| system
| GFS2: fsid=CyberCluster:GizServer.1: telling LM to unmount
| GFS2: fsid=CyberCluster:GizServer.1: withdrawn
| Pid: 2659, comm: delete_workqueu Tainted: G W ---------------- T
| 2.6.32-131.2.1.el6.x86_64 #1
| Call Trace:
| [<ffffffffa044ffd2>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2]
| [<ffffffffa0425209>] ? trunc_dealloc+0xa9/0x130 [gfs2]
| [<ffffffffa04501dd>] ? gfs2_consist_inode_i+0x5d/0x60 [gfs2]
| [<ffffffffa0435584>] ? gfs2_dinode_dealloc+0x64/0x210 [gfs2]
| [<ffffffffa044e1da>] ? gfs2_delete_inode+0x1ba/0x280 [gfs2]
| [<ffffffffa044e0ad>] ? gfs2_delete_inode+0x8d/0x280 [gfs2]
| [<ffffffffa044e020>] ? gfs2_delete_inode+0x0/0x280 [gfs2]
| [<ffffffff8118cfbe>] ? generic_delete_inode+0xde/0x1d0
| [<ffffffffa0432940>] ? delete_work_func+0x0/0x80 [gfs2]
| [<ffffffff8118d115>] ? generic_drop_inode+0x65/0x80
| [<ffffffffa044cc4e>] ? gfs2_drop_inode+0x2e/0x30 [gfs2]
| [<ffffffff8118bf82>] ? iput+0x62/0x70
| [<ffffffffa0432994>] ? delete_work_func+0x54/0x80 [gfs2]
| [<ffffffff810887d0>] ? worker_thread+0x170/0x2a0
| [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40
| [<ffffffff81088660>] ? worker_thread+0x0/0x2a0
| [<ffffffff8108dd96>] ? kthread+0x96/0xa0
| [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
| [<ffffffff8108dd00>] ? kthread+0x0/0xa0
| [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
| no_formal_ino = 9582
| no_addr = 6698267
| i_disksize = 6838
| blocks = 0
| i_goal = 6698304
| i_diskflags = 0x00000000
| i_height = 1
| i_depth = 0
| i_entries = 0
| i_eattr = 0
| GFS2: fsid=CyberCluster:GizServer.1: gfs2_delete_inode: -5
| gdlm_unlock 5,66351b err=-22
|
|
| Only, with different inodes each time.
|
| After that event, services running on that filesystem are marked
| failed and
| not moved over another node. Any access to that fs yields I/O error.
| Server
| needed to be rebooted to properly work again.
|
| I did ran a fsck last night on that filesystem, and it did find some
| errors,
| but nothing serious. Lots (realy lots) of those :
|
| Ondisk and fsck bitmaps differ at block 5771602 (0x581152)
| Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
| Metadata type is 0 (free)
| Fix bitmap for block 5771602 (0x581152) ? (y/n)
|
| And after completing the fsck, I started back some services, and I got
| the
| same error on another filesystem that is practily empty and used for
| small
| utilities used troughout the cluster...
|
| What should I do to find the source of this problem ?

Hi,

I believe this is a GFS2 bug we've already solved.
Please contact Red Hat Support.

Regards,

Bob Peterson
Red Hat File Systems



------------------------------

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

End of Linux-cluster Digest, Vol 86, Issue 19
*********************************************




--
Thanks,
Balaji S
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux