Re: brick is down but gluster volume status says it's fine

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It looks like this is to do with the stale port issue.

I think it's pretty clear from the below that the digitalcorpora brick process is shown by volume status as having the same TCP port as the public volume brick on gluster-2,
49156. But is actually listening on 49154.  So although the brick process is technically up nothing is talking to it.  I am surprised I don't see more errors in the brick log for brick8/public.  It also explains the wack-a-mole problem,  Every time I kill and restart the daemon it must be grabbing the port of another brick and then that volume brick  goes silent. 

I killed all the brick processes and restarted glusterd and everything came up ok.


[root@gluster-2 ~]# glv status digitalcorpora | grep -v ^Self
Status of volume: digitalcorpora
Gluster process                             TCP Port  RDMA Port  Online  Pid

------------------------------------------------------------------------------
Brick gluster-2:/export/brick7/digitalcorpo
ra                                          49156     0          Y       125708
Brick gluster1.vsnet.gmu.edu:/export/brick7
/digitalcorpora                             49152     0          Y       12345
Brick gluster0:/export/brick7/digitalcorpor
a                                           49152     0          Y       16098
 
Task Status of Volume digitalcorpora
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@gluster-2 ~]# glv status public  | grep -v ^Self
Status of volume: public
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster1:/export/brick8/public        49156     0          Y       3519
Brick gluster2:/export/brick8/public        49156     0          Y       8578
Brick gluster0:/export/brick8/public        49156     0          Y       3176
 
Task Status of Volume public
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@gluster-2 ~]# netstat -pant | grep 8578 | grep 0.0.0.0
tcp        0      0 0.0.0.0:49156           0.0.0.0:*               LISTEN      8578/glusterfsd    
[root@gluster-2 ~]# netstat -pant | grep 125708 | grep 0.0.0.0
tcp        0      0 0.0.0.0:49154           0.0.0.0:*               LISTEN      125708/glusterfsd  
[root@gluster-2 ~]# ps -c  --pid  125708 8578
   PID CLS PRI TTY      STAT   TIME COMMAND
  8578 TS   19 ?        Ssl  224:20 /usr/sbin/glusterfsd -s gluster2 --volfile-id public.gluster2.export-brick8-public -p /var/lib/glusterd/vols/public/run/gluster2-export-bric
125708 TS   19 ?        Ssl    0:08 /usr/sbin/glusterfsd -s gluster-2 --volfile-id digitalcorpora.gluster-2.export-brick7-digitalcorpora -p /var/lib/glusterd/vols/digitalcorpor
[root@gluster-2 ~]#



On 24 October 2017 at 13:56, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:


On Tue, Oct 24, 2017 at 11:13 PM, Alastair Neil <ajneil.tech@xxxxxxxxx> wrote:
gluster version 3.10.6, replica 3 volume, daemon is present but does not appear to be functioning

peculiar behaviour.  If I kill the glusterfs brick daemon and restart glusterd then the brick becomes available - but one of my other volumes bricks on the same server goes down in the same way it's like wack-a-mole.

any ideas?

The subject and the data looks to be contradictory to me. Brick log (what you shared) doesn't have a cleanup_and_exit () trigger for a shutdown. Are you sure brick is down? OTOH, I see a mismatch of port for brick7/digitalcorpora where the brick process has 49154 but gluster volume status shows 49152. There is an issue with stale port which we're trying to address through https://review.gluster.org/18541 . But could you specify what exactly the problem is? Is it the stale port  or the conflict between volume status output and actual brick health? If it's the latter, I'd need further information like output of "gluster get-state" command from the same node.



[root@gluster-2 bricks]# glv status digitalcorpora
Status of volume: digitalcorpora
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster-2:/export/brick7/digitalcorpo
ra                                          49156     0          Y       125708
Brick gluster1.vsnet.gmu.edu:/export/brick7
/digitalcorpora                             49152     0          Y       12345
Brick gluster0:/export/brick7/digitalcorpor
a                                           49152     0          Y       16098
Self-heal Daemon on localhost               N/A       N/A        Y       126625
Self-heal Daemon on gluster1                N/A       N/A        Y       15405
Self-heal Daemon on gluster0                N/A       N/A        Y       18584
 
Task Status of Volume digitalcorpora
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@gluster-2 bricks]# glv heal digitalcorpora info
Brick gluster-2:/export/brick7/digitalcorpora
Status: Transport endpoint is not connected
Number of entries: -

Brick gluster1.vsnet.gmu.edu:/export/brick7/digitalcorpora
/.trashcan
/DigitalCorpora/hello2.txt
/DigitalCorpora
Status: Connected
Number of entries: 3

Brick gluster0:/export/brick7/digitalcorpora
/.trashcan
/DigitalCorpora/hello2.txt
/DigitalCorpora
Status: Connected
Number of entries: 3

[2017-10-24 17:18:48.288505] W [glusterfsd.c:1360:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f6f83c9de25] -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x55a148eeb135] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x55a148eeaf5b] ) 0-: received signum (15), shutting down
[2017-10-24 17:18:59.270384] I [MSGID: 100030] [glusterfsd.c:2503:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.10.6 (args: /usr/sbin/glusterfsd -s gluster-2 --volfile-id digitalcorpora.gluster-2.export-brick7-digitalcorpora -p /var/lib/glusterd/vols/digitalcorpora/run/gluster-2-export-brick7-digitalcorpora.pid -S /var/run/gluster/f8e0b3393e47dc51a07c6609f9b40841.socket --brick-name /export/brick7/digitalcorpora -l /var/log/glusterfs/bricks/export-brick7-digitalcorpora.log --xlator-option *-posix.glusterd-uuid=032c17f5-8cc9-445f-aa45-897b5a066b43 --brick-port 49154 --xlator-option digitalcorpora-server.listen-port=49154)
[2017-10-24 17:18:59.285279] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2017-10-24 17:19:04.611723] I [rpcsvc.c:2237:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
[2017-10-24 17:19:04.611815] W [MSGID: 101002] [options.c:954:xl_opt_validate] 0-digitalcorpora-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction
[2017-10-24 17:19:04.615974] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option 'rpc-auth.auth-glusterfs' is not recognized
[2017-10-24 17:19:04.616033] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option 'rpc-auth.auth-unix' is not recognized
[2017-10-24 17:19:04.616070] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option 'rpc-auth.auth-null' is not recognized
[2017-10-24 17:19:04.616134] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option 'auth-path' is not recognized
[2017-10-24 17:19:04.616177] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-server: option 'ping-timeout' is not recognized
[2017-10-24 17:19:04.616203] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: option 'rpc-auth-allow-insecure' is not recognized
[2017-10-24 17:19:04.616215] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: option 'auth.addr./export/brick7/digitalcorpora.allow' is not recognized
[2017-10-24 17:19:04.616226] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: option 'auth-path' is not recognized
[2017-10-24 17:19:04.616237] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: option 'auth.login.b17f2513-7d9c-4174-a0c5-de4a752d46ca.password' is not recognized
[2017-10-24 17:19:04.616248] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-/export/brick7/digitalcorpora: option 'auth.login./export/brick7/digitalcorpora.allow' is not recognized
[2017-10-24 17:19:04.616283] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-quota: option 'timeout' is not recognized
[2017-10-24 17:19:04.616367] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-digitalcorpora-trash: option 'brick-path' is not recognized
Final graph:
+------------------------------------------------------------------------------+
  1: volume digitalcorpora-posix
  2:     type storage/posix
  3:     option glusterd-uuid 032c17f5-8cc9-445f-aa45-897b5a066b43
  4:     option directory /export/brick7/digitalcorpora
  5:     option volume-id 61efe58a-ae5b-4d8b-b9f9-67829867c442
  6:     option brick-uid 36
  7:     option brick-gid 36
  8: end-volume
  9: 
 10: volume digitalcorpora-trash
 11:     type features/trash
 12:     option trash-dir .trashcan
 13:     option brick-path /export/brick7/digitalcorpora
 14:     option trash-internal-op off
 15:     subvolumes digitalcorpora-posix
 16: end-volume
 17: 
 18: volume digitalcorpora-changetimerecorder
 19:     type features/changetimerecorder
 20:     option db-type sqlite3
 21:     option hot-brick off
 22:     option db-name digitalcorpora.db
 23:     option db-path /export/brick7/digitalcorpora/.glusterfs/
 24:     option record-exit off
 25:     option ctr_link_consistency off
 26:     option ctr_lookupheal_link_timeout 300
 27:     option ctr_lookupheal_inode_timeout 300
 28:     option record-entry on
 29:     option ctr-enabled off
 30:     option record-counters off
 31:     option ctr-record-metadata-heat off
 32:     option sql-db-cachesize 12500
 33:     option sql-db-wal-autocheckpoint 25000
 34:     subvolumes digitalcorpora-trash
 35: end-volume
 36: 
 37: volume digitalcorpora-changelog
 38:     type features/changelog
 39:     option changelog-brick /export/brick7/digitalcorpora
 40:     option changelog-dir /export/brick7/digitalcorpora/.glusterfs/changelogs
 41:     option changelog-barrier-timeout 120
 42:     subvolumes digitalcorpora-changetimerecorder
 43: end-volume
 44: 
 45: volume digitalcorpora-bitrot-stub
 46:     type features/bitrot-stub
 47:     option export /export/brick7/digitalcorpora
 48:     subvolumes digitalcorpora-changelog
 49: end-volume
 50: 
 51: volume digitalcorpora-access-control
 52:     type features/access-control
 53:     subvolumes digitalcorpora-bitrot-stub
 54: end-volume
 55: 
 56: volume digitalcorpora-locks
 57:     type features/locks
 58:     subvolumes digitalcorpora-access-control
 59: end-volume
 60: 
 61: volume digitalcorpora-worm
 62:     type features/worm
 63:     option worm off
 64:     option worm-file-level off
 65:     subvolumes digitalcorpora-locks
 66: end-volume
 67: 
 68: volume digitalcorpora-read-only
 69:     type features/read-only
 70:     option read-only off
 71:     subvolumes digitalcorpora-worm
 72: end-volume
 73: 
 74: volume digitalcorpora-leases
 75:     type features/leases
 76:     option leases off
 77:     subvolumes digitalcorpora-read-only
 78: end-volume
 79: 
 80: volume digitalcorpora-upcall
 81:     type features/upcall
 82:     option cache-invalidation off
 83:     subvolumes digitalcorpora-leases
 84: end-volume
 85: 
 86: volume digitalcorpora-io-threads
 87:     type performance/io-threads
 88:     subvolumes digitalcorpora-upcall
 89: end-volume
 90: 
 91: volume digitalcorpora-marker
 92:     type features/marker
 93:     option volume-uuid 61efe58a-ae5b-4d8b-b9f9-67829867c442
 94:     option timestamp-file /var/lib/glusterd/vols/digitalcorpora/marker.tstamp
 95:     option quota-version 0
 96:     option xtime off
 97:     option gsync-force-xtime off
 98:     option quota off
 99:     option inode-quota off
100:     subvolumes digitalcorpora-io-threads
101: end-volume
102: 
103: volume digitalcorpora-barrier
104:     type features/barrier
105:     option barrier disable
106:     option barrier-timeout 120
107:     subvolumes digitalcorpora-marker
108: end-volume
109: 
110: volume digitalcorpora-index
111:     type features/index
112:     option index-base /export/brick7/digitalcorpora/.glusterfs/indices
113:     option xattrop-dirty-watchlist trusted.afr.dirty
114:     option xattrop-pending-watchlist trusted.afr.digitalcorpora-
115:     subvolumes digitalcorpora-barrier
116: end-volume
117: 
118: volume digitalcorpora-quota
119:     type features/quota
120:     option volume-uuid digitalcorpora
121:     option server-quota off
122:     option timeout 0
123:     option deem-statfs off
124:     subvolumes digitalcorpora-index
125: end-volume
126: 
127: volume digitalcorpora-io-stats
128:     type debug/io-stats
129:     option unique-id /export/brick7/digitalcorpora
130:     option log-level WARNING
131:     option latency-measurement off
132:     option count-fop-hits off
133:     subvolumes digitalcorpora-quota
134: end-volume
135: 
136: volume /export/brick7/digitalcorpora
137:     type performance/decompounder
138:     option rpc-auth-allow-insecure on
139:     option auth.addr./export/brick7/digitalcorpora.allow 129.174.125.204,129.174.93.204
140:     option auth-path /export/brick7/digitalcorpora
141:     option auth.login.b17f2513-7d9c-4174-a0c5-de4a752d46ca.password 6c007ad0-b5a2-4564-8464-300f8317e5c7
142:     option auth.login./export/brick7/digitalcorpora.allow b17f2513-7d9c-4174-a0c5-de4a752d46ca
143:     subvolumes digitalcorpora-io-stats
144: end-volume
145: 
146: volume digitalcorpora-server
147:     type protocol/server
148:     option transport.socket.listen-port 49154
149:     option rpc-auth.auth-glusterfs on
150:     option rpc-auth.auth-unix on
151:     option rpc-auth.auth-null on
152:     option transport-type tcp
153:     option transport.address-family inet
154:     option auth.login./export/brick7/digitalcorpora.allow b17f2513-7d9c-4174-a0c5-de4a752d46ca
155:     option auth.login.b17f2513-7d9c-4174-a0c5-de4a752d46ca.password 6c007ad0-b5a2-4564-8464-300f8317e5c7
156:     option auth-path /export/brick7/digitalcorpora
157:     option auth.addr./export/brick7/digitalcorpora.allow 129.174.125.204,129.174.93.204
158:     option ping-timeout 42
159:     option transport.socket.keepalive 1
160:     option rpc-auth-allow-insecure on
161:     option transport.tcp-user-timeout 0
162:     option transport.socket.keepalive-time 20
163:     option transport.socket.keepalive-interval 2
164:     option transport.socket.keepalive-count 9
165:     subvolumes /export/brick7/digitalcorpora
166: end-volume
167: 
+------------------------------------------------------------------------------+
[2017-10-24 17:22:21.438620] W [socket.c:593:__socket_rwv] 0-glusterfs: readv on 129.174.126.87:24007 failed (No data available)


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux