Re: [glusterfs-3.6.0beta3-0.11.gitd01b00a] gluster volume status is running even though the Disk is detached

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I applied the patches, compiled and installed the gluster.

# glusterfs --version
glusterfs 3.7dev built on Oct 28 2014 12:03:10
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.

# git log
commit 990ce16151c3af17e4cdaa94608b737940b60e4d
Author: Lalatendu Mohanty <lmohanty@xxxxxxxxxx>
Date:   Tue Jul 1 07:52:27 2014 -0400

    Posix: Brick failure detection fix for ext4 filesystem
...
...

I see below messages

File /var/log/glusterfs/etc-glusterfs-glusterd.vol.log :

The message "I [MSGID: 106005] [glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: Brick 192.168.1.246:/zp2/brick2 has disconnected from glusterd." repeated 39 times between [2014-10-28 05:58:09.209419] and [2014-10-28 06:00:06.226330]
[2014-10-28 06:00:09.226507] W [socket.c:545:__socket_rwv] 0-management: readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid argument)
[2014-10-28 06:00:09.226712] I [MSGID: 106005] [glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: Brick 192.168.1.246:/zp2/brick2 has disconnected from glusterd.
[2014-10-28 06:00:12.226881] W [socket.c:545:__socket_rwv] 0-management: readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid argument)
[2014-10-28 06:00:15.227249] W [socket.c:545:__socket_rwv] 0-management: readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid argument)
[2014-10-28 06:00:18.227616] W [socket.c:545:__socket_rwv] 0-management: readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid argument)
[2014-10-28 06:00:21.227976] W [socket.c:545:__socket_rwv] 0-management: readv on 

.....
.....

[2014-10-28 06:19:15.142867] I [glusterd-handler.c:1280:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
The message "I [MSGID: 106005] [glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: Brick 192.168.1.246:/zp2/brick2 has disconnected from glusterd." repeated 12 times between [2014-10-28 06:18:09.368752] and [2014-10-28 06:18:45.373063]
[2014-10-28 06:23:38.207649] W [glusterfsd.c:1194:cleanup_and_exit] (--> 0-: received signum (15), shutting down


dmesg output:

SPLError: 7869:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'zp2' has encountered an uncorrectable I/O failure and has been suspended.

SPLError: 7868:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'zp2' has encountered an uncorrectable I/O failure and has been suspended.

SPLError: 7869:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'zp2' has encountered an uncorrectable I/O failure and has been suspended.

The brick is still online.

# gluster volume status
Status of volume: repvol
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick 192.168.1.246:/zp1/brick1 49152 Y 4067
Brick 192.168.1.246:/zp2/brick2 49153 Y 4078
NFS Server on localhost 2049 Y 4092
Self-heal Daemon on localhost N/A Y 4097
 
Task Status of Volume repvol
------------------------------------------------------------------------------
There are no active volume tasks
 
# gluster volume info
 
Volume Name: repvol
Type: Replicate
Volume ID: ba1e7c6d-1e1c-45cd-8132-5f4fa4d2d22b
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.1.246:/zp1/brick1
Brick2: 192.168.1.246:/zp2/brick2
Options Reconfigured:
storage.health-check-interval: 30

Let me know if you need further information.

Thanks,
Kiran.

On Tue, Oct 28, 2014 at 11:44 AM, Kiran Patil <kiran@xxxxxxxxxxxxx> wrote:
I changed  git fetch git://review.gluster.org/glusterfs  to git fetch http://review.gluster.org/glusterfs  and now it works.

Thanks,
Kiran.

On Tue, Oct 28, 2014 at 11:13 AM, Kiran Patil <kiran@xxxxxxxxxxxxx> wrote:
Hi Niels,

I am getting "fatal: Couldn't find remote ref refs/changes/13/8213/9" error.

Steps to reproduce the issue.

1) # git clone git://review.gluster.org/glusterfs
Initialized empty Git repository in /root/gluster-3.6/glusterfs/.git/
remote: Counting objects: 84921, done.
remote: Compressing objects: 100% (48307/48307), done.
remote: Total 84921 (delta 57264), reused 63233 (delta 36254)
Receiving objects: 100% (84921/84921), 23.23 MiB | 192 KiB/s, done.
Resolving deltas: 100% (57264/57264), done.

2) # cd glusterfs
    # git branch
    * master

3) # git fetch git://review.gluster.org/glusterfs refs/changes/13/8213/9 && git checkout FETCH_HEAD
fatal: Couldn't find remote ref refs/changes/13/8213/9

Note: I also tried the above steps on git repo https://github.com/gluster/glusterfs and the result is same as above.

Please let me know if I miss any steps.

Thanks,
Kiran.

On Mon, Oct 27, 2014 at 5:53 PM, Niels de Vos <ndevos@xxxxxxxxxx> wrote:
On Mon, Oct 27, 2014 at 05:19:13PM +0530, Kiran Patil wrote:
> Hi,
>
> I created replicated vol with two bricks on the same node and copied some
> data to it.
>
> Now removed the disk which has hosted one of the brick of the volume.
>
> Storage.health-check-interval is set to 30 seconds.
>
> I could see the disk is unavailable using zpool command of zfs on linux but
> the gluster volume status still displays the brick process running which
> should have been shutdown by this time.
>
> Is this a bug in 3.6 since it is mentioned as feature "
> https://github.com/gluster/glusterfs/blob/release-3.6/doc/features/brick-failure-detection.md"
>  or am I doing any mistakes here?

The initial detection of brick failures did not work for all
filesystems. It may not work for ZFS too. A fix has been posted, but it
has not been merged into the master branch yet. When the change has been
merged, it can get backported to 3.6 and 3.5.

You may want to test with the patch applied, and add your "+1 Verified"
to the change in case it makes it functional for you:
- http://review.gluster.org/8213

Cheers,
Niels

>
> [root@fractal-c92e gluster-3.6]# gluster volume status
> Status of volume: repvol
> Gluster process Port Online Pid
> ------------------------------------------------------------------------------
> Brick 192.168.1.246:/zp1/brick1 49154 Y 17671
> Brick 192.168.1.246:/zp2/brick2 49155 Y 17682
> NFS Server on localhost 2049 Y 17696
> Self-heal Daemon on localhost N/A Y 17701
>
> Task Status of Volume repvol
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
>
> [root@fractal-c92e gluster-3.6]# gluster volume info
>
> Volume Name: repvol
> Type: Replicate
> Volume ID: d4f992b1-1393-43b8-9fda-2e2b6e3b5039
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.1.246:/zp1/brick1
> Brick2: 192.168.1.246:/zp2/brick2
> Options Reconfigured:
> storage.health-check-interval: 30
>
> [root@fractal-c92e gluster-3.6]# zpool status zp2
>   pool: zp2
>  state: UNAVAIL
> status: One or more devices are faulted in response to IO failures.
> action: Make sure the affected devices are connected, then run 'zpool
> clear'.
>    see: http://zfsonlinux.org/msg/ZFS-8000-HC
>   scan: none requested
> config:
>
> NAME        STATE     READ WRITE CKSUM
> zp2         UNAVAIL      0     0     0  insufficient replicas
>   sdb       UNAVAIL      0     0     0
>
> errors: 2 data errors, use '-v' for a list
>
>
> Thanks,
> Kiran.

> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel




_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux