On Tue, Oct 28, 2014 at 01:10:56PM +0530, Kiran Patil wrote: > I applied the patches, compiled and installed the gluster. > > # glusterfs --version > glusterfs 3.7dev built on Oct 28 2014 12:03:10 > Repository revision: git://git.gluster.com/glusterfs.git > Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/> > GlusterFS comes with ABSOLUTELY NO WARRANTY. > It is licensed to you under your choice of the GNU Lesser > General Public License, version 3 or any later version (LGPLv3 > or later), or the GNU General Public License, version 2 (GPLv2), > in all cases as published by the Free Software Foundation. > > # git log > commit 990ce16151c3af17e4cdaa94608b737940b60e4d > Author: Lalatendu Mohanty <lmohanty@xxxxxxxxxx> > Date: Tue Jul 1 07:52:27 2014 -0400 > > Posix: Brick failure detection fix for ext4 filesystem > ... > ... > > I see below messages Many thanks Kiran! Do you have the messages from the brick that uses the zp2 mountpoint? There also should be a file with a timestamp when the last check was done successfully. If the brick is still running, this timestamp should get updated every storage.health-check-interval seconds: /zp2/brick2/.glusterfs/health_check Niels > > File /var/log/glusterfs/etc-glusterfs-glusterd.vol.log : > > The message "I [MSGID: 106005] > [glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: Brick > 192.168.1.246:/zp2/brick2 has disconnected from glusterd." repeated 39 > times between [2014-10-28 05:58:09.209419] and [2014-10-28 06:00:06.226330] > [2014-10-28 06:00:09.226507] W [socket.c:545:__socket_rwv] 0-management: > readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid > argument) > [2014-10-28 06:00:09.226712] I [MSGID: 106005] > [glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: Brick > 192.168.1.246:/zp2/brick2 has disconnected from glusterd. > [2014-10-28 06:00:12.226881] W [socket.c:545:__socket_rwv] 0-management: > readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid > argument) > [2014-10-28 06:00:15.227249] W [socket.c:545:__socket_rwv] 0-management: > readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid > argument) > [2014-10-28 06:00:18.227616] W [socket.c:545:__socket_rwv] 0-management: > readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid > argument) > [2014-10-28 06:00:21.227976] W [socket.c:545:__socket_rwv] 0-management: > readv on > > ..... > ..... > > [2014-10-28 06:19:15.142867] I > [glusterd-handler.c:1280:__glusterd_handle_cli_get_volume] 0-glusterd: > Received get vol req > The message "I [MSGID: 106005] > [glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: Brick > 192.168.1.246:/zp2/brick2 has disconnected from glusterd." repeated 12 > times between [2014-10-28 06:18:09.368752] and [2014-10-28 06:18:45.373063] > [2014-10-28 06:23:38.207649] W [glusterfsd.c:1194:cleanup_and_exit] (--> > 0-: received signum (15), shutting down > > > dmesg output: > > SPLError: 7869:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'zp2' has > encountered an uncorrectable I/O failure and has been suspended. > > SPLError: 7868:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'zp2' has > encountered an uncorrectable I/O failure and has been suspended. > > SPLError: 7869:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'zp2' has > encountered an uncorrectable I/O failure and has been suspended. > > The brick is still online. > > # gluster volume status > Status of volume: repvol > Gluster process Port Online Pid > ------------------------------------------------------------------------------ > Brick 192.168.1.246:/zp1/brick1 49152 Y 4067 > Brick 192.168.1.246:/zp2/brick2 49153 Y 4078 > NFS Server on localhost 2049 Y 4092 > Self-heal Daemon on localhost N/A Y 4097 > > Task Status of Volume repvol > ------------------------------------------------------------------------------ > There are no active volume tasks > > # gluster volume info > > Volume Name: repvol > Type: Replicate > Volume ID: ba1e7c6d-1e1c-45cd-8132-5f4fa4d2d22b > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: 192.168.1.246:/zp1/brick1 > Brick2: 192.168.1.246:/zp2/brick2 > Options Reconfigured: > storage.health-check-interval: 30 > > Let me know if you need further information. > > Thanks, > Kiran. > > On Tue, Oct 28, 2014 at 11:44 AM, Kiran Patil <kiran@xxxxxxxxxxxxx> wrote: > > > I changed git fetch git://review.gluster.org/glusterfs to git fetch > > http://review.gluster.org/glusterfs and now it works. > > > > Thanks, > > Kiran. > > > > On Tue, Oct 28, 2014 at 11:13 AM, Kiran Patil <kiran@xxxxxxxxxxxxx> wrote: > > > >> Hi Niels, > >> > >> I am getting "fatal: Couldn't find remote ref refs/changes/13/8213/9" > >> error. > >> > >> Steps to reproduce the issue. > >> > >> 1) # git clone git://review.gluster.org/glusterfs > >> Initialized empty Git repository in /root/gluster-3.6/glusterfs/.git/ > >> remote: Counting objects: 84921, done. > >> remote: Compressing objects: 100% (48307/48307), done. > >> remote: Total 84921 (delta 57264), reused 63233 (delta 36254) > >> Receiving objects: 100% (84921/84921), 23.23 MiB | 192 KiB/s, done. > >> Resolving deltas: 100% (57264/57264), done. > >> > >> 2) # cd glusterfs > >> # git branch > >> * master > >> > >> 3) # git fetch git://review.gluster.org/glusterfs refs/changes/13/8213/9 > >> && git checkout FETCH_HEAD > >> fatal: Couldn't find remote ref refs/changes/13/8213/9 > >> > >> Note: I also tried the above steps on git repo > >> https://github.com/gluster/glusterfs and the result is same as above. > >> > >> Please let me know if I miss any steps. > >> > >> Thanks, > >> Kiran. > >> > >> On Mon, Oct 27, 2014 at 5:53 PM, Niels de Vos <ndevos@xxxxxxxxxx> wrote: > >> > >>> On Mon, Oct 27, 2014 at 05:19:13PM +0530, Kiran Patil wrote: > >>> > Hi, > >>> > > >>> > I created replicated vol with two bricks on the same node and copied > >>> some > >>> > data to it. > >>> > > >>> > Now removed the disk which has hosted one of the brick of the volume. > >>> > > >>> > Storage.health-check-interval is set to 30 seconds. > >>> > > >>> > I could see the disk is unavailable using zpool command of zfs on > >>> linux but > >>> > the gluster volume status still displays the brick process running > >>> which > >>> > should have been shutdown by this time. > >>> > > >>> > Is this a bug in 3.6 since it is mentioned as feature " > >>> > > >>> https://github.com/gluster/glusterfs/blob/release-3.6/doc/features/brick-failure-detection.md > >>> " > >>> > or am I doing any mistakes here? > >>> > >>> The initial detection of brick failures did not work for all > >>> filesystems. It may not work for ZFS too. A fix has been posted, but it > >>> has not been merged into the master branch yet. When the change has been > >>> merged, it can get backported to 3.6 and 3.5. > >>> > >>> You may want to test with the patch applied, and add your "+1 Verified" > >>> to the change in case it makes it functional for you: > >>> - http://review.gluster.org/8213 > >>> > >>> Cheers, > >>> Niels > >>> > >>> > > >>> > [root@fractal-c92e gluster-3.6]# gluster volume status > >>> > Status of volume: repvol > >>> > Gluster process Port Online Pid > >>> > > >>> ------------------------------------------------------------------------------ > >>> > Brick 192.168.1.246:/zp1/brick1 49154 Y 17671 > >>> > Brick 192.168.1.246:/zp2/brick2 49155 Y 17682 > >>> > NFS Server on localhost 2049 Y 17696 > >>> > Self-heal Daemon on localhost N/A Y 17701 > >>> > > >>> > Task Status of Volume repvol > >>> > > >>> ------------------------------------------------------------------------------ > >>> > There are no active volume tasks > >>> > > >>> > > >>> > [root@fractal-c92e gluster-3.6]# gluster volume info > >>> > > >>> > Volume Name: repvol > >>> > Type: Replicate > >>> > Volume ID: d4f992b1-1393-43b8-9fda-2e2b6e3b5039 > >>> > Status: Started > >>> > Number of Bricks: 1 x 2 = 2 > >>> > Transport-type: tcp > >>> > Bricks: > >>> > Brick1: 192.168.1.246:/zp1/brick1 > >>> > Brick2: 192.168.1.246:/zp2/brick2 > >>> > Options Reconfigured: > >>> > storage.health-check-interval: 30 > >>> > > >>> > [root@fractal-c92e gluster-3.6]# zpool status zp2 > >>> > pool: zp2 > >>> > state: UNAVAIL > >>> > status: One or more devices are faulted in response to IO failures. > >>> > action: Make sure the affected devices are connected, then run 'zpool > >>> > clear'. > >>> > see: http://zfsonlinux.org/msg/ZFS-8000-HC > >>> > scan: none requested > >>> > config: > >>> > > >>> > NAME STATE READ WRITE CKSUM > >>> > zp2 UNAVAIL 0 0 0 insufficient replicas > >>> > sdb UNAVAIL 0 0 0 > >>> > > >>> > errors: 2 data errors, use '-v' for a list > >>> > > >>> > > >>> > Thanks, > >>> > Kiran. > >>> > >>> > _______________________________________________ > >>> > Gluster-devel mailing list > >>> > Gluster-devel@xxxxxxxxxxx > >>> > http://supercolony.gluster.org/mailman/listinfo/gluster-devel > >>> > >>> > >> > > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-devel