It looks like this is to do with the stale port issue.
I think it's pretty clear from the below that the digitalcorpora brick process is shown by volume status as having the same TCP port as the public volume brick on gluster-2, 49156. But is actually listening on 49154. So although the brick process is technically up nothing is talking to it. I am surprised I don't see more errors in the brick log for brick8/public. It also explains the wack-a-mole problem, Every time I kill and restart the daemon it must be grabbing the port of another brick and then that volume brick goes silent.
I think it's pretty clear from the below that the digitalcorpora brick process is shown by volume status as having the same TCP port as the public volume brick on gluster-2, 49156. But is actually listening on 49154. So although the brick process is technically up nothing is talking to it. I am surprised I don't see more errors in the brick log for brick8/public. It also explains the wack-a-mole problem, Every time I kill and restart the daemon it must be grabbing the port of another brick and then that volume brick goes silent.
I killed all the brick processes and restarted glusterd and everything came up ok.
[root@gluster-2 ~]# glv status digitalcorpora | grep -v ^Self
Status of volume: digitalcorpora
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gluster-2:/export/brick7/digitalcorpo
ra 49156 0 Y 125708
Brick gluster1.vsnet.gmu.edu:/export/brick7
/digitalcorpora 49152 0 Y 12345
Brick gluster0:/export/brick7/digitalcorpor
a 49152 0 Y 16098
Task Status of Volume digitalcorpora
------------------------------------------------------------------------------
There are no active volume tasks
[root@gluster-2 ~]# glv status public | grep -v ^Self
Status of volume: public
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gluster1:/export/brick8/public 49156 0 Y 3519
Brick gluster2:/export/brick8/public 49156 0 Y 8578
Brick gluster0:/export/brick8/public 49156 0 Y 3176
Task Status of Volume public
------------------------------------------------------------------------------
There are no active volume tasks
[root@gluster-2 ~]# netstat -pant | grep 8578 | grep 0.0.0.0
tcp 0 0 0.0.0.0:49156 0.0.0.0:* LISTEN 8578/glusterfsd
[root@gluster-2 ~]# netstat -pant | grep 125708 | grep 0.0.0.0
tcp 0 0 0.0.0.0:49154 0.0.0.0:* LISTEN 125708/glusterfsd
[root@gluster-2 ~]# ps -c --pid 125708 8578
PID CLS PRI TTY STAT TIME COMMAND
8578 TS 19 ? Ssl 224:20 /usr/sbin/glusterfsd -s gluster2 --volfile-id public.gluster2.export-brick8-public -p /var/lib/glusterd/vols/public/run/gluster2-export-bric
125708 TS 19 ? Ssl 0:08 /usr/sbin/glusterfsd -s gluster-2 --volfile-id digitalcorpora.gluster-2.export-brick7-digitalcorpora -p /var/lib/glusterd/vols/digitalcorpor
[root@gluster-2 ~]#
Status of volume: digitalcorpora
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gluster-2:/export/brick7/digitalcorpo
ra 49156 0 Y 125708
Brick gluster1.vsnet.gmu.edu:/export/brick7
/digitalcorpora 49152 0 Y 12345
Brick gluster0:/export/brick7/digitalcorpor
a 49152 0 Y 16098
Task Status of Volume digitalcorpora
------------------------------------------------------------------------------
There are no active volume tasks
[root@gluster-2 ~]# glv status public | grep -v ^Self
Status of volume: public
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gluster1:/export/brick8/public 49156 0 Y 3519
Brick gluster2:/export/brick8/public 49156 0 Y 8578
Brick gluster0:/export/brick8/public 49156 0 Y 3176
Task Status of Volume public
------------------------------------------------------------------------------
There are no active volume tasks
[root@gluster-2 ~]# netstat -pant | grep 8578 | grep 0.0.0.0
tcp 0 0 0.0.0.0:49156 0.0.0.0:* LISTEN 8578/glusterfsd
[root@gluster-2 ~]# netstat -pant | grep 125708 | grep 0.0.0.0
tcp 0 0 0.0.0.0:49154 0.0.0.0:* LISTEN 125708/glusterfsd
[root@gluster-2 ~]# ps -c --pid 125708 8578
PID CLS PRI TTY STAT TIME COMMAND
8578 TS 19 ? Ssl 224:20 /usr/sbin/glusterfsd -s gluster2 --volfile-id public.gluster2.export-brick8-public -p /var/lib/glusterd/vols/public/run/gluster2-export-bric
125708 TS 19 ? Ssl 0:08 /usr/sbin/glusterfsd -s gluster-2 --volfile-id digitalcorpora.gluster-2.export-brick7-digitalcorpora -p /var/lib/glusterd/vols/digitalcorpor
[root@gluster-2 ~]#
On 24 October 2017 at 13:56, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:
On Tue, Oct 24, 2017 at 11:13 PM, Alastair Neil <ajneil.tech@xxxxxxxxx> wrote:peculiar behaviour. If I kill the glusterfs brick daemon and restart glusterd then the brick becomes available - but one of my other volumes bricks on the same server goes down in the same way it's like wack-a-mole.gluster version 3.10.6, replica 3 volume, daemon is present but does not appear to be functioningany ideas?The subject and the data looks to be contradictory to me. Brick log (what you shared) doesn't have a cleanup_and_exit () trigger for a shutdown. Are you sure brick is down? OTOH, I see a mismatch of port for brick7/digitalcorpora where the brick process has 49154 but gluster volume status shows 49152. There is an issue with stale port which we're trying to address through https://review.gluster.org/18541 . But could you specify what exactly the problem is? Is it the stale port or the conflict between volume status output and actual brick health? If it's the latter, I'd need further information like output of "gluster get-state" command from the same node.______________________________
[root@gluster-2 bricks]# glv status digitalcorporaStatus of volume: digitalcorpora
Gluster processTCP Port RDMA Port Online Pid
------------------------------------------------------------ ------------------
Brick gluster-2:/export/brick7/digitalcorpo
ra49156 0 Y 125708
Brick gluster1.vsnet.gmu.edu:/export/brick7
/digitalcorpora49152 0 Y 12345
Brick gluster0:/export/brick7/digitalcorpor
a49152 0 Y 16098
Self-heal Daemon on localhost N/A N/A Y 126625
Self-heal Daemon on gluster1 N/A N/A Y 15405
Self-heal Daemon on gluster0 N/A N/A Y 18584
Task Status of Volume digitalcorpora
------------------------------------------------------------ ------------------
There are no active volume tasks
[root@gluster-2 bricks]# glv heal digitalcorpora info
Brick gluster-2:/export/brick7/digitalcorpora
Status: Transport endpoint is not connected
Number of entries: -
Brick gluster1.vsnet.gmu.edu:/export/brick7/digitalcorpora
/.trashcan
/DigitalCorpora/hello2.txt
/DigitalCorpora
Status: Connected
Number of entries: 3
Brick gluster0:/export/brick7/digitalcorpora
/.trashcan
/DigitalCorpora/hello2.txt
/DigitalCorpora
Status: Connected
Number of entries: 3
[2017-10-24 17:18:48.288505] W [glusterfsd.c:1360:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x 7e25) [0x7f6f83c9de25] -->/usr/sbin/glusterfsd(gluste rfs_sigwaiter+0xe5) [0x55a148eeb135] -->/usr/sbin/glusterfsd(cleanu p_and_exit+0x6b) [0x55a148eeaf5b] ) 0-: received signum (15), shutting down
[2017-10-24 17:18:59.270384] I [MSGID: 100030] [glusterfsd.c:2503:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.10.6 (args: /usr/sbin/glusterfsd -s gluster-2 --volfile-id digitalcorpora.gluster-2.export-brick7-digitalcorpora -p /var/lib/glusterd/vols/digital corpora/run/gluster-2-export- brick7-digitalcorpora.pid -S /var/run/gluster/f8e0b3393e47d c51a07c6609f9b40841.socket --brick-name /export/brick7/digitalcorpora -l /var/log/glusterfs/bricks/expo rt-brick7-digitalcorpora.log --xlator-option *-posix.glusterd-uuid=032c17f5 -8cc9-445f-aa45-897b5a066b43 --brick-port 49154 --xlator-option digitalcorpora-server.listen-p ort=49154)
[2017-10-24 17:18:59.285279] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2017-10-24 17:19:04.611723] I [rpcsvc.c:2237:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
[2017-10-24 17:19:04.611815] W [MSGID: 101002] [options.c:954:xl_opt_validate] 0-digitalcorpora-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port' , continuing with correction
[2017-10-24 17:19:04.615974] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option 'rpc-auth.auth-glusterfs' is not recognized
[2017-10-24 17:19:04.616033] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option 'rpc-auth.auth-unix' is not recognized
[2017-10-24 17:19:04.616070] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option 'rpc-auth.auth-null' is not recognized
[2017-10-24 17:19:04.616134] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option 'auth-path' is not recognized
[2017-10-24 17:19:04.616177] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option 'ping-timeout' is not recognized
[2017-10-24 17:19:04.616203] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpor a: option 'rpc-auth-allow-insecure' is not recognized
[2017-10-24 17:19:04.616215] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpor a: option 'auth.addr./export/brick7/digi talcorpora.allow' is not recognized
[2017-10-24 17:19:04.616226] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpor a: option 'auth-path' is not recognized
[2017-10-24 17:19:04.616237] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpor a: option 'auth.login.b17f2513-7d9c-4174 -a0c5-de4a752d46ca.password' is not recognized
[2017-10-24 17:19:04.616248] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpor a: option 'auth.login./export/brick7/dig italcorpora.allow' is not recognized
[2017-10-24 17:19:04.616283] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-quota: option 'timeout' is not recognized
[2017-10-24 17:19:04.616367] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-trash: option 'brick-path' is not recognized
Final graph:
+----------------------------------------------------------- -------------------+
1: volume digitalcorpora-posix
2: type storage/posix
3: option glusterd-uuid 032c17f5-8cc9-445f-aa45-897b5a066b43
4: option directory /export/brick7/digitalcorpora
5: option volume-id 61efe58a-ae5b-4d8b-b9f9-67829867c442
6: option brick-uid 36
7: option brick-gid 36
8: end-volume
9:
10: volume digitalcorpora-trash
11: type features/trash
12: option trash-dir .trashcan
13: option brick-path /export/brick7/digitalcorpora
14: option trash-internal-op off
15: subvolumes digitalcorpora-posix
16: end-volume
17:
18: volume digitalcorpora-changetimerecorder
19: type features/changetimerecorder
20: option db-type sqlite3
21: option hot-brick off
22: option db-name digitalcorpora.db
23: option db-path /export/brick7/digitalcorpora/.glusterfs/
24: option record-exit off
25: option ctr_link_consistency off
26: option ctr_lookupheal_link_timeout 300
27: option ctr_lookupheal_inode_timeout 300
28: option record-entry on
29: option ctr-enabled off
30: option record-counters off
31: option ctr-record-metadata-heat off
32: option sql-db-cachesize 12500
33: option sql-db-wal-autocheckpoint 25000
34: subvolumes digitalcorpora-trash
35: end-volume
36:
37: volume digitalcorpora-changelog
38: type features/changelog
39: option changelog-brick /export/brick7/digitalcorpora
40: option changelog-dir /export/brick7/digitalcorpora/.glusterfs/changelogs
41: option changelog-barrier-timeout 120
42: subvolumes digitalcorpora-changetimerecorder
43: end-volume
44:
45: volume digitalcorpora-bitrot-stub
46: type features/bitrot-stub
47: option export /export/brick7/digitalcorpora
48: subvolumes digitalcorpora-changelog
49: end-volume
50:
51: volume digitalcorpora-access-control
52: type features/access-control
53: subvolumes digitalcorpora-bitrot-stub
54: end-volume
55:
56: volume digitalcorpora-locks
57: type features/locks
58: subvolumes digitalcorpora-access-control
59: end-volume
60:
61: volume digitalcorpora-worm
62: type features/worm
63: option worm off
64: option worm-file-level off
65: subvolumes digitalcorpora-locks
66: end-volume
67:
68: volume digitalcorpora-read-only
69: type features/read-only
70: option read-only off
71: subvolumes digitalcorpora-worm
72: end-volume
73:
74: volume digitalcorpora-leases
75: type features/leases
76: option leases off
77: subvolumes digitalcorpora-read-only
78: end-volume
79:
80: volume digitalcorpora-upcall
81: type features/upcall
82: option cache-invalidation off
83: subvolumes digitalcorpora-leases
84: end-volume
85:
86: volume digitalcorpora-io-threads
87: type performance/io-threads
88: subvolumes digitalcorpora-upcall
89: end-volume
90:
91: volume digitalcorpora-marker
92: type features/marker
93: option volume-uuid 61efe58a-ae5b-4d8b-b9f9-67829867c442
94: option timestamp-file /var/lib/glusterd/vols/digitalcorpora/marker.tstamp
95: option quota-version 0
96: option xtime off
97: option gsync-force-xtime off
98: option quota off
99: option inode-quota off
100: subvolumes digitalcorpora-io-threads
101: end-volume
102:
103: volume digitalcorpora-barrier
104: type features/barrier
105: option barrier disable
106: option barrier-timeout 120
107: subvolumes digitalcorpora-marker
108: end-volume
109:
110: volume digitalcorpora-index
111: type features/index
112: option index-base /export/brick7/digitalcorpora/.glusterfs/indices
113: option xattrop-dirty-watchlist trusted.afr.dirty
114: option xattrop-pending-watchlist trusted.afr.digitalcorpora-
115: subvolumes digitalcorpora-barrier
116: end-volume
117:
118: volume digitalcorpora-quota
119: type features/quota
120: option volume-uuid digitalcorpora
121: option server-quota off
122: option timeout 0
123: option deem-statfs off
124: subvolumes digitalcorpora-index
125: end-volume
126:
127: volume digitalcorpora-io-stats
128: type debug/io-stats
129: option unique-id /export/brick7/digitalcorpora
130: option log-level WARNING
131: option latency-measurement off
132: option count-fop-hits off
133: subvolumes digitalcorpora-quota
134: end-volume
135:
136: volume /export/brick7/digitalcorpora
137: type performance/decompounder
138: option rpc-auth-allow-insecure on
139: option auth.addr./export/brick7/digitalcorpora.allow 129.174.125.204,129.174.93.204
140: option auth-path /export/brick7/digitalcorpora
141: option auth.login.b17f2513-7d9c-4174-a0c5-de4a752d46ca.password 6c007ad0-b5a2-4564-8464-300f83 17e5c7
142: option auth.login./export/brick7/digitalcorpora.allow b17f2513-7d9c-4174-a0c5-de4a75 2d46ca
143: subvolumes digitalcorpora-io-stats
144: end-volume
145:
146: volume digitalcorpora-server
147: type protocol/server
148: option transport.socket.listen-port 49154
149: option rpc-auth.auth-glusterfs on
150: option rpc-auth.auth-unix on
151: option rpc-auth.auth-null on
152: option transport-type tcp
153: option transport.address-family inet
154: option auth.login./export/brick7/digitalcorpora.allow b17f2513-7d9c-4174-a0c5-de4a75 2d46ca
155: option auth.login.b17f2513-7d9c-4174-a0c5-de4a752d46ca.password 6c007ad0-b5a2-4564-8464-300f83 17e5c7
156: option auth-path /export/brick7/digitalcorpora
157: option auth.addr./export/brick7/digitalcorpora.allow 129.174.125.204,129.174.93.204
158: option ping-timeout 42
159: option transport.socket.keepalive 1
160: option rpc-auth-allow-insecure on
161: option transport.tcp-user-timeout 0
162: option transport.socket.keepalive-time 20
163: option transport.socket.keepalive-interval 2
164: option transport.socket.keepalive-count 9
165: subvolumes /export/brick7/digitalcorpora
166: end-volume
167:
+----------------------------------------------------------- -------------------+
[2017-10-24 17:22:21.438620] W [socket.c:593:__socket_rwv] 0-glusterfs: readv on 129.174.126.87:24007 failed (No data available)_________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users