Hi David,
On 23/05/19 3:54 AM, David Cunningham
wrote:
Hi Ravi,
Please see the log attached.
When I grep -E "Connected to |disconnected from"
gvol0-add-brick-mount.log, I don't see a "Connected to
gvol0-client-1". It looks like this temporary mount is not able to
connect to the 2nd brick, which is why the lookup is failing due to
lack of quorum.
The output of "gluster volume status" is as follows.
Should there be something listening on gfs3? I'm not sure
whether it having TCP Port and Pid as N/A is a symptom or
cause. Thank you.
# gluster volume status
Status of volume: gvol0
Gluster process TCP Port RDMA
Port Online Pid
------------------------------------------------------------------------------
Brick gfs1:/nodirectwritedata/gluster/gvol0 49152
0 Y 7706
Brick gfs2:/nodirectwritedata/gluster/gvol0 49152
0 Y 7624
Brick gfs3:/nodirectwritedata/gluster/gvol0 N/A
N/A N N/A
Can you see if the following steps help?
1. Do a `setfattr -n trusted.afr.gvol0-client-2 -v
0x000000000000000100000001 /nodirectwritedata/gluster/gvol0`
on both gfs1 and gfs2.
2. 'gluster volume start gvol0 force`
3. Check if Brick-3 now comes online with a valid TCP port and
PID. If it doesn't, check the brick log under
/var/log/glusterfs/bricks on gfs3 to see why.
Thanks,
Ravi
Self-heal Daemon on localhost N/A
N/A Y 19853
Self-heal Daemon on gfs1 N/A
N/A Y 28600
Self-heal Daemon on gfs2 N/A
N/A Y 17614
Task Status of Volume gvol0
------------------------------------------------------------------------------
There are no active volume tasks
If you are trying this again, please 'gluster volume set
$volname client-log-level DEBUG`before attempting the
add-brick and attach the gvol0-add-brick-mount.log here.
After that, you can change the client-log-level back to
INFO.
-Ravi
On
22/05/19 11:32 AM, Ravishankar N wrote:
On
22/05/19 11:23 AM, David Cunningham wrote:
Hi Ravi,
I'd already done exactly that before, where step
3 was a simple 'rm -rf
/nodirectwritedata/gluster/gvol0'. Have you another
suggestion on what the cleanup or reformat should
be?
`rm -rf /nodirectwritedata/gluster/gvol0` does look okay
to me David. Basically, '/nodirectwritedata/gluster/gvol0'
must be empty and must not have any extended attributes
set on it. Why fuse_first_lookup() is failing is a bit of
a mystery to me at this point. :-(
Regards,
Ravi
Hmm, so the volume info seems to indicate that
the add-brick was successful but the gfid xattr
is missing on the new brick (as are the actual
files, barring the .glusterfs folder, according
to your previous mail).
Do you want to try removing and adding it
again?
1. `gluster volume remove-brick gvol0 replica 2
gfs3:/nodirectwritedata/gluster/gvol0 force`
from gfs1
2. Check that gluster volume info is now back
to a 1x2 volume on all nodes and `gluster peer
status` is connected on all nodes.
3. Cleanup or reformat
'/nodirectwritedata/gluster/gvol0' on gfs3.
4. `gluster volume add-brick gvol0 replica 3
arbiter 1 gfs3:/nodirectwritedata/gluster/gvol0`
from gfs1.
5. Check that the files are getting healed on
to the new brick.
Thanks,
Ravi
On
22/05/19 6:50 AM, David Cunningham wrote:
Hi Ravi,
Certainly. On the existing two
nodes:
gfs1 # getfattr -d -m. -e hex
/nodirectwritedata/gluster/gvol0
getfattr: Removing leading '/'
from absolute path names
# file:
nodirectwritedata/gluster/gvol0
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gvol0-client-2=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
gfs2 # getfattr -d -m. -e hex
/nodirectwritedata/gluster/gvol0
getfattr: Removing leading '/'
from absolute path names
# file:
nodirectwritedata/gluster/gvol0
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gvol0-client-0=0x000000000000000000000000
trusted.afr.gvol0-client-2=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
On the new node:
gfs3 # getfattr -d -m. -e hex
/nodirectwritedata/gluster/gvol0
getfattr: Removing leading '/'
from absolute path names
# file:
nodirectwritedata/gluster/gvol0
trusted.afr.dirty=0x000000000000000000000001
trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
Output of "gluster volume info"
is the same on all 3 nodes and is:
# gluster volume info
Volume Name: gvol0
Type: Replicate
Volume ID:
fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1:
gfs1:/nodirectwritedata/gluster/gvol0
Brick2:
gfs2:/nodirectwritedata/gluster/gvol0
Brick3:
gfs3:/nodirectwritedata/gluster/gvol0
(arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
Hi David,
Could you provide the `getfattr -d -m. -e
hex /nodirectwritedata/gluster/gvol0`
output of all bricks and the output of
`gluster volume info`?
Thanks,
Ravi
On
22/05/19 4:57 AM, David Cunningham
wrote:
Hi Sanju,
Here's what
glusterd.log says on the
new arbiter server when
trying to add the node:
[2019-05-22
00:15:05.963059] I
[run.c:242:runner_log]
(-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd)
[0x7fe4ca9102cd]
-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85)
[0x7fe4ca9bbb85]
-->/lib64/libglusterfs.so.0(runner_log+0x115)
[0x7fe4d5ecc955] )
0-management: Ran script:
/var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
--volname=gvol0
--version=1
--volume-op=add-brick
--gd-workdir=/var/lib/glusterd
[2019-05-22
00:15:05.963177] I [MSGID:
106578]
[glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks]
0-management:
replica-count is set 3
[2019-05-22
00:15:05.963228] I [MSGID:
106578]
[glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks]
0-management:
arbiter-count is set 1
[2019-05-22
00:15:05.963257] I [MSGID:
106578]
[glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks]
0-management: type is set
0, need to change it
[2019-05-22
00:15:17.015268] E [MSGID:
106053]
[glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops]
0-management: Failed to
set extended attribute
trusted.add-brick :
Transport endpoint is not
connected [Transport
endpoint is not connected]
[2019-05-22
00:15:17.036479] E [MSGID:
106073]
[glusterd-brick-ops.c:2595:glusterd_op_add_brick]
0-glusterd: Unable to add
bricks
[2019-05-22
00:15:17.036595] E [MSGID:
106122]
[glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn]
0-management: Add-brick
commit failed.
[2019-05-22
00:15:17.036710] E [MSGID:
106122]
[glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn]
0-management: commit
failed on operation Add
brick
As before
gvol0-add-brick-mount.log
said:
[2019-05-22
00:15:17.005695] I
[fuse-bridge.c:4267:fuse_init]
0-glusterfs-fuse: FUSE
inited with protocol
versions: glusterfs 7.24
kernel 7.22
[2019-05-22
00:15:17.005749] I
[fuse-bridge.c:4878:fuse_graph_sync]
0-fuse: switched to graph
0
[2019-05-22
00:15:17.010101] E
[fuse-bridge.c:4336:fuse_first_lookup]
0-fuse: first lookup on
root failed (Transport
endpoint is not connected)
[2019-05-22
00:15:17.014217] W
[fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 2:
LOOKUP() / => -1
(Transport endpoint is not
connected)
[2019-05-22
00:15:17.015097] W
[fuse-resolve.c:127:fuse_resolve_gfid_cbk]
0-fuse:
00000000-0000-0000-0000-000000000001:
failed to resolve
(Transport endpoint is not
connected)
[2019-05-22
00:15:17.015158] W
[fuse-bridge.c:3294:fuse_setxattr_resume]
0-glusterfs-fuse: 3:
SETXATTR
00000000-0000-0000-0000-000000000001/1
(trusted.add-brick)
resolution failed
[2019-05-22
00:15:17.035636] I
[fuse-bridge.c:5144:fuse_thread_proc]
0-fuse: initating unmount
of /tmp/mntYGNbj9
[2019-05-22
00:15:17.035854] W
[glusterfsd.c:1500:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
[0x55c81b63de75]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
[0x55c81b63dceb] ) 0-:
received signum (15),
shutting down
[2019-05-22
00:15:17.035942] I
[fuse-bridge.c:5914:fini]
0-fuse: Unmounting
'/tmp/mntYGNbj9'.
[2019-05-22
00:15:17.035966] I
[fuse-bridge.c:5919:fini]
0-fuse: Closing fuse
connection to
'/tmp/mntYGNbj9'.
Here are the processes
running on the new arbiter
server:
# ps -ef | grep gluster
root 3466 1 0
20:13 ? 00:00:00
/usr/sbin/glusterfs -s
localhost --volfile-id
gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l
/var/log/glusterfs/glustershd.log
-S
/var/run/gluster/24c12b09f93eec8e.socket
--xlator-option
*replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
--process-name glustershd
root 6832 1 0
May16 ? 00:02:10
/usr/sbin/glusterd -p
/var/run/glusterd.pid
--log-level INFO
root 17841 1 0
May16 ? 00:00:58
/usr/sbin/glusterfs
--process-name fuse
--volfile-server=gfs1
--volfile-id=/gvol0
/mnt/glusterfs
Here are the files
created on the new arbiter
server:
# find
/nodirectwritedata/gluster/gvol0
| xargs ls -ald
drwxr-xr-x 3 root root
4096 May 21 20:15
/nodirectwritedata/gluster/gvol0
drw------- 2 root root
4096 May 21 20:15
/nodirectwritedata/gluster/gvol0/.glusterfs
Thank you for your
help!
David,
can you please attach
glusterd.logs? As the error
message says, Commit failed on
the arbitar node, we might be
able to find some issue on that
node.
Hello,
We're
adding an
arbiter node
to an existing
volume and
having an
issue. Can
anyone help?
The root cause
error appears
to be
"00000000-0000-0000-0000-000000000001: failed to resolve (Transport
endpoint is
not
connected)",
as below.
We are
running
glusterfs
5.6.1. Thanks
in advance for
any
assistance!
On
existing node
gfs1, trying
to add new
arbiter node
gfs3:
# gluster
volume
add-brick
gvol0 replica
3 arbiter 1
gfs3:/nodirectwritedata/gluster/gvol0
volume
add-brick:
failed: Commit
failed on
gfs3. Please
check log file
for details.
This looks like a
glusterd issue. Please
check the glusterd logs
for more info.
Adding the glusterd dev
to this thread. Sanju, can
you take a look?
Regards,
Nithya
On new
node gfs3 in
gvol0-add-brick-mount.log:
[2019-05-17
01:20:22.689721] I
[fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE inited with
protocol
versions:
glusterfs 7.24
kernel 7.22
[2019-05-17
01:20:22.689778]
I
[fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched to graph 0
[2019-05-17
01:20:22.694897]
E
[fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first lookup on root
failed
(Transport
endpoint is
not connected)
[2019-05-17
01:20:22.699770]
W
[fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
00000000-0000-0000-0000-000000000001:
failed to
resolve
(Transport
endpoint is
not connected)
[2019-05-17
01:20:22.699834]
W
[fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 2: SETXATTR
00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution
failed
[2019-05-17
01:20:22.715656]
I
[fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: initating unmount of
/tmp/mntQAtu3f
[2019-05-17
01:20:22.715865]
W
[glusterfsd.c:1500:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
[0x560886581e75]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-:
received
signum (15),
shutting down
[2019-05-17
01:20:22.715926]
I
[fuse-bridge.c:5914:fini] 0-fuse: Unmounting '/tmp/mntQAtu3f'.
[2019-05-17
01:20:22.715953]
I
[fuse-bridge.c:5919:fini] 0-fuse: Closing fuse connection to
'/tmp/mntQAtu3f'.
Processes
running on new
node gfs3:
# ps -ef
| grep gluster
root
6832 1 0
20:17 ?
00:00:00
/usr/sbin/glusterd
-p
/var/run/glusterd.pid
--log-level
INFO
root
15799 1 0
20:17 ?
00:00:00
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l
/var/log/glusterfs/glustershd.log
-S
/var/run/gluster/24c12b09f93eec8e.socket
--xlator-option
*replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
--process-name
glustershd
root 16856
16735 0 21:21
pts/0
00:00:00 grep
--color=auto
gluster
--
David
Cunningham,
Voisonics
Limited
http://voisonics.com/
USA: +1 213
221 1092
New Zealand:
+64 (0)28 2558
3782
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
--
--
David
Cunningham,
Voisonics Limited
http://voisonics.com/
USA: +1 213 221
1092
New Zealand: +64
(0)28 2558 3782
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
--
David Cunningham,
Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28
2558 3782
--
David Cunningham, Voisonics
Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
--
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
|