Re: add-brick: failed: Commit failed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ravi,

Thank you, that seems to have resolved the issue. After doing this, "gluster volume status all" showed gfs3 as online with a port and pid, however "gluster volume status all" didn't show any sync activity happening. At this point we loaded gfs3 with new firewall rules which explicitly allowed access from gfs1 and gfs2, and then "gluster volume status all" showed the file syncing. The gfs3 server should have allow access from gfs1 and gfs2 anyway by default, however I now believe that perhaps this wasn't the case, and maybe it was a firewall issue all along.

Thanks for all your help.


On Sat, 25 May 2019 at 01:49, Ravishankar N <ravishankar@xxxxxxxxxx> wrote:

Hi David,

On 23/05/19 3:54 AM, David Cunningham wrote:
Hi Ravi,

Please see the log attached.
When I  grep -E "Connected to |disconnected from" gvol0-add-brick-mount.log,  I don't see a "Connected to gvol0-client-1". It looks like this temporary mount is not able to connect to the 2nd brick, which is why the lookup is failing due to lack of quorum.
The output of "gluster volume status" is as follows. Should there be something listening on gfs3? I'm not sure whether it having TCP Port and Pid as N/A is a symptom or cause. Thank you.

# gluster volume status
Status of volume: gvol0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gfs1:/nodirectwritedata/gluster/gvol0 49152     0          Y       7706
Brick gfs2:/nodirectwritedata/gluster/gvol0 49152     0          Y       7624
Brick gfs3:/nodirectwritedata/gluster/gvol0 N/A       N/A        N       N/A 

Can you see if the following steps help?

1. Do a `setfattr -n trusted.afr.gvol0-client-2 -v 0x000000000000000100000001 /nodirectwritedata/gluster/gvol0` on both gfs1 and gfs2.

2. 'gluster volume start gvol0 force`

3. Check if Brick-3 now comes online with a valid TCP port and PID. If it doesn't, check the brick log under /var/log/glusterfs/bricks on gfs3 to see why.

Thanks,

Ravi


Self-heal Daemon on localhost               N/A       N/A        Y       19853
Self-heal Daemon on gfs1                    N/A       N/A        Y       28600
Self-heal Daemon on gfs2                    N/A       N/A        Y       17614
 
Task Status of Volume gvol0
------------------------------------------------------------------------------
There are no active volume tasks


On Wed, 22 May 2019 at 18:06, Ravishankar N <ravishankar@xxxxxxxxxx> wrote:

If you are trying this again, please 'gluster volume set $volname client-log-level DEBUG`before attempting the add-brick and attach the gvol0-add-brick-mount.log here. After that, you can change the client-log-level back to INFO.

-Ravi

On 22/05/19 11:32 AM, Ravishankar N wrote:


On 22/05/19 11:23 AM, David Cunningham wrote:
Hi Ravi,

I'd already done exactly that before, where step 3 was a simple 'rm -rf /nodirectwritedata/gluster/gvol0'. Have you another suggestion on what the cleanup or reformat should be?
`rm -rf /nodirectwritedata/gluster/gvol0` does look okay to me David. Basically, '/nodirectwritedata/gluster/gvol0' must be empty and must not have any extended attributes set on it. Why fuse_first_lookup() is failing is a bit of a mystery to me at this point. :-(
Regards,
Ravi

Thank you.


On Wed, 22 May 2019 at 13:56, Ravishankar N <ravishankar@xxxxxxxxxx> wrote:

Hmm, so the volume info seems to indicate that the add-brick was successful but the gfid xattr is missing on the new brick (as are the actual files, barring the .glusterfs folder, according to your previous mail).

Do you want to try removing and adding it again?

1. `gluster volume remove-brick gvol0 replica 2 gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1

2. Check that gluster volume info is now back to a 1x2 volume on all nodes and `gluster peer status` is  connected on all nodes.

3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on gfs3.

4. `gluster volume add-brick gvol0 replica 3 arbiter 1 gfs3:/nodirectwritedata/gluster/gvol0` from gfs1.

5. Check that the files are getting healed on to the new brick.

Thanks,
Ravi
On 22/05/19 6:50 AM, David Cunningham wrote:
Hi Ravi,

Certainly. On the existing two nodes:

gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
getfattr: Removing leading '/' from absolute path names
# file: nodirectwritedata/gluster/gvol0
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gvol0-client-2=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6

gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
getfattr: Removing leading '/' from absolute path names
# file: nodirectwritedata/gluster/gvol0
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gvol0-client-0=0x000000000000000000000000
trusted.afr.gvol0-client-2=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6

On the new node:

gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
getfattr: Removing leading '/' from absolute path names
# file: nodirectwritedata/gluster/gvol0
trusted.afr.dirty=0x000000000000000000000001
trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6

Output of "gluster volume info" is the same on all 3 nodes and is:

# gluster volume info
 
Volume Name: gvol0
Type: Replicate
Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gfs1:/nodirectwritedata/gluster/gvol0
Brick2: gfs2:/nodirectwritedata/gluster/gvol0
Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet


On Wed, 22 May 2019 at 12:43, Ravishankar N <ravishankar@xxxxxxxxxx> wrote:
Hi David,
Could you provide the `getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0` output of all bricks and the output of `gluster volume info`?

Thanks,
Ravi
On 22/05/19 4:57 AM, David Cunningham wrote:
Hi Sanju,

Here's what glusterd.log says on the new arbiter server when trying to add the node:

[2019-05-22 00:15:05.963059] I [run.c:242:runner_log] (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd) [0x7fe4ca9102cd] -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85) [0x7fe4ca9bbb85] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fe4d5ecc955] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh --volname=gvol0 --version=1 --volume-op=add-brick --gd-workdir=/var/lib/glusterd
[2019-05-22 00:15:05.963177] I [MSGID: 106578] [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks] 0-management: replica-count is set 3
[2019-05-22 00:15:05.963228] I [MSGID: 106578] [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks] 0-management: arbiter-count is set 1
[2019-05-22 00:15:05.963257] I [MSGID: 106578] [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: type is set 0, need to change it
[2019-05-22 00:15:17.015268] E [MSGID: 106053] [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops] 0-management: Failed to set extended attribute trusted.add-brick : Transport endpoint is not connected [Transport endpoint is not connected]
[2019-05-22 00:15:17.036479] E [MSGID: 106073] [glusterd-brick-ops.c:2595:glusterd_op_add_brick] 0-glusterd: Unable to add bricks
[2019-05-22 00:15:17.036595] E [MSGID: 106122] [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit failed.
[2019-05-22 00:15:17.036710] E [MSGID: 106122] [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] 0-management: commit failed on operation Add brick

As before gvol0-add-brick-mount.log said:

[2019-05-22 00:15:17.005695] I [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22
[2019-05-22 00:15:17.005749] I [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched to graph 0
[2019-05-22 00:15:17.010101] E [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected)
[2019-05-22 00:15:17.014217] W [fuse-bridge.c:897:fuse_attr_cbk] 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected)
[2019-05-22 00:15:17.015097] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected)
[2019-05-22 00:15:17.015158] W [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 3: SETXATTR 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution failed
[2019-05-22 00:15:17.035636] I [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: initating unmount of /tmp/mntYGNbj9
[2019-05-22 00:15:17.035854] W [glusterfsd.c:1500:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55c81b63de75] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55c81b63dceb] ) 0-: received signum (15), shutting down
[2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini] 0-fuse: Unmounting '/tmp/mntYGNbj9'.
[2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini] 0-fuse: Closing fuse connection to '/tmp/mntYGNbj9'.

Here are the processes running on the new arbiter server:
# ps -ef | grep gluster
root      3466     1  0 20:13 ?        00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/24c12b09f93eec8e.socket --xlator-option *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name glustershd
root      6832     1  0 May16 ?        00:02:10 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
root     17841     1  0 May16 ?        00:00:58 /usr/sbin/glusterfs --process-name fuse --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs

Here are the files created on the new arbiter server:
# find /nodirectwritedata/gluster/gvol0 | xargs ls -ald
drwxr-xr-x 3 root root 4096 May 21 20:15 /nodirectwritedata/gluster/gvol0
drw------- 2 root root 4096 May 21 20:15 /nodirectwritedata/gluster/gvol0/.glusterfs

Thank you for your help!


On Tue, 21 May 2019 at 00:10, Sanju Rakonde <srakonde@xxxxxxxxxx> wrote:
David,

can you please attach glusterd.logs? As the error message says, Commit failed on the arbitar node, we might be able to find some issue on that node.

On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran <nbalacha@xxxxxxxxxx> wrote:


On Fri, 17 May 2019 at 06:01, David Cunningham <dcunningham@xxxxxxxxxxxxx> wrote:
Hello,

We're adding an arbiter node to an existing volume and having an issue. Can anyone help? The root cause error appears to be "00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected)", as below.

We are running glusterfs 5.6.1. Thanks in advance for any assistance!

On existing node gfs1, trying to add new arbiter node gfs3:

# gluster volume add-brick gvol0 replica 3 arbiter 1 gfs3:/nodirectwritedata/gluster/gvol0
volume add-brick: failed: Commit failed on gfs3. Please check log file for details.

This looks like a glusterd issue. Please check the glusterd logs for more info.
Adding the glusterd dev to this thread. Sanju, can you take a look?
 
Regards,
Nithya

On new node gfs3 in gvol0-add-brick-mount.log:

[2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22
[2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched to graph 0
[2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected)
[2019-05-17 01:20:22.699770] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected)
[2019-05-17 01:20:22.699834] W [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 2: SETXATTR 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution failed
[2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: initating unmount of /tmp/mntQAtu3f
[2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-: received signum (15), shutting down
[2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse: Unmounting '/tmp/mntQAtu3f'.
[2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse: Closing fuse connection to '/tmp/mntQAtu3f'.

Processes running on new node gfs3:

# ps -ef | grep gluster
root      6832     1  0 20:17 ?        00:00:00 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
root     15799     1  0 20:17 ?        00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/24c12b09f93eec8e.socket --xlator-option *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name glustershd
root     16856 16735  0 21:21 pts/0    00:00:00 grep --color=auto gluster

--
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users


--
Thanks,
Sanju


--
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users


--
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782


--
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782


--
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782


--
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux