gluster server daemon connection refused

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Group,

I'm having some difficulty debugging a problem (or maybe more than
one problem) I have a replica 3 gluster file system. Basic setup is
three machines, all CentOS 7 (up to date), gluster 3.8 (provided by
the CentOS repos). XFS brick partitions.

At a (so far random) point in time the gluster server daemons stop
communicating with each other. The glusterfs processes for each
volume on each server are still running though. If I restart them
manually (systemctl restart glusterfsd glusterd) communication is
once again established.

(For me) there is no obvious log message that would point me to the
source of the problem.

In general all servers show a lot of different warnings and errors
in the brick logs almost all the time clients are connected. All
clients are Fedora 25 machines using the FUSE gluster client (3.8)

From the brick log on the storage servers:
It seems the quota sub system has some problems:
[2017-05-22 11:41:36.323233] W [marker-quota.c:33:mq_loc_copy]
0-marker: src loc is not valid
[2017-05-22 11:41:36.323269] E
[marker-quota.c:1472:mq_initiate_quota_task] 0-home-marker: loc copy
failed

And there are a lot of warnings about xattrs like this:
[2017-05-22 11:32:28.111240] W [MSGID: 113001]
[posix.c:4212:posix_get_ancestry_non_directory] 0-home-posix:
listxattr failed
on/srv/gluster_home/brick/.glusterfs/34/06/3406a026-596b-40b3-8dd1-8186e3072031
[No such file or directory]

And I get some of these:
[2017-05-22 11:41:29.591258] E [MSGID: 115081]
[server-rpc-fops.c:1201:server_fstat_cbk] 0-home-server: 923966:
FSTAT -2 (197f90db-ff62-4436-b200-f4347b6c2ba0) ==> (Operation not
permitted) [Operation not permitted]



There are some warnings in the glustershd.log but none prior to the
disconnect give me a clue as to what's going on:

[2017-05-22 07:37:48.003835] W [MSGID: 114031]
[client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-home-client-1:
remote operation failed. Path:
<gfid:b8812b00-7bda-4871-ab39-82d1eb8a2607>
(b8812b00-7bda-4871-ab39-82d1eb8a2607) [No such file or directory]
[2017-05-22 07:37:48.003842] W [MSGID: 114031]
[client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-home-client-0:
remote operation failed. Path:
<gfid:b8812b00-7bda-4871-ab39-82d1eb8a2607>
(b8812b00-7bda-4871-ab39-82d1eb8a2607) [No such file or directory]
[2017-05-22 07:37:48.003896] W [MSGID: 114031]
[client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-home-client-2:
remote operation failed. Path:
<gfid:b8812b00-7bda-4871-ab39-82d1eb8a2607>
(b8812b00-7bda-4871-ab39-82d1eb8a2607) [No such file or directory]
[2017-05-22 11:14:36.832051] W [socket.c:590:__socket_rwv]
0-home-client-1: readv on X.X.X.8:49156 failed (No data available)
[2017-05-22 11:14:36.832103] I [MSGID: 114018]
[client.c:2280:client_rpc_notify] 0-home-client-1: disconnected from
home-client-1. Client process will keep trying to connect to
glusterd until brick's port is available
[2017-05-22 11:14:36.832295] W [socket.c:590:__socket_rwv]
0-home-client-2: readv on X.X.X.9:49156 failed (No data available)
[2017-05-22 11:14:36.832317] I [MSGID: 114018]
[client.c:2280:client_rpc_notify] 0-home-client-2: disconnected from
home-client-2. Client process will keep trying to connect to
glusterd until brick's port is available
[2017-05-22 11:14:36.832328] W [MSGID: 108001]
[afr-common.c:4467:afr_notify] 0-home-replicate-0: Client-quorum is
not met
[2017-05-22 11:14:47.423671] I [rpc-clnt.c:1965:rpc_clnt_reconfig]
0-home-client-1: changing port to 49156 (from 0)
[2017-05-22 11:14:47.430133] I [rpc-clnt.c:1965:rpc_clnt_reconfig]
0-home-client-2: changing port to 49156 (from 0)
[2017-05-22 11:14:47.434348] E [socket.c:2309:socket_connect_finish]
0-home-client-1: connection to X.X.X.8:49156 failed (Connection refused)
[2017-05-22 11:14:47.438925] E [socket.c:2309:socket_connect_finish]
0-home-client-2: connection to X.X.X.9:49156 failed (Connection refused)


Gluster Volume Info:
Volume Name: home
Type: Replicate
Volume ID: 47875706-1ed1-48b7-bc5d-0dca8ec5cd58
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: four:/srv/gluster_home/brick
Brick2: five:/srv/gluster_home/brick
Brick3: six:/srv/gluster_home/brick
Options Reconfigured:
server.root-squash: off
network.ping-timeout: 10
cluster.quorum-type: auto
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on

Installed gluster server (CentOS 7) version:
glusterfs-server-3.8.11-1.el7.x86_64

Installed gluster client (Fedora 25) version:
glusterfs-3.8.9-1.fc24.x86_64


Does anybody have an idea why the communication might be interrupted?

And also why there a so many warnings about extended attributes of
non existing files?

I'm running basically the same setup with CentOS clients for oVirt
without any gluster warnings or errors at all. Currently I'm at a
loss. Any help is highly appreciated!
Thanks
Richard Neuboeck

-- 
---------------------------------------------------------------------
[a] Department for Theoretical Chemistry
    University of Vienna
    Waehringer Strasse 17/3/304, 1090 Wien, Austria
[p] +43 1 4277 52735
[m] hawk@xxxxxxxxxxxxxxxx
---------------------------------------------------------------------

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux