Hello,
We have a 2 node mirrored GlusterFS cluster, and one of the bricks (server1) has recently gone offline:
Status of volume: gvol0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick server2:/nodirectwritedata/gluster/gvol0 49153 0 Y 22015
Brick server1:/nodirectwritedata/gluster/gvol0 N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 22037
Self-heal Daemon on server1 N/A N/A Y 3320
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick server2:/nodirectwritedata/gluster/gvol0 49153 0 Y 22015
Brick server1:/nodirectwritedata/gluster/gvol0 N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 22037
Self-heal Daemon on server1 N/A N/A Y 3320
This happened during the day with no action on our part to cause it. However, glusterfsd is still running on server1. In nodirectwritedata-gluster-gvol0.log we see lines like this before the brick went offline:
... same as following lines back to the start of the log file...
[2021-08-25 20:07:12.002764 +0000] E [MSGID: 113002] [posix-entry-ops.c:682:posix_mkdir] 0-gvol0-posix: gfid is null for (null) [Invalid argument]
[2021-08-25 20:07:12.002820 +0000] E [MSGID: 115056] [server-rpc-fops_v2.c:497:server4_mkdir_cbk] 0-gvol0-server: MKDIR info [{frame=337803516}, {MKDIR_path=}, {uuid_utoa=00000000-0000-0000-0000-000000000001}, {bname=}, {client=CTX_ID:e9b02d95-722b-49d1-b5a3-b3f1eca78ef4-GRAPH_ID:0-PID:3320-HOST:server1-PC_NAME:gvol0-client-1-RECON_NO:-0}, {error-xlator=gvol0-posix}, {errno=22}, {error=Invalid argument}]
[2021-08-25 20:08:54.003409 +0000] E [MSGID: 113002] [posix-entry-ops.c:682:posix_mkdir] 0-gvol0-posix: gfid is null for (null) [Invalid argument]
[2021-08-25 20:08:54.003476 +0000] E [MSGID: 115056] [server-rpc-fops_v2.c:497:server4_mkdir_cbk] 0-gvol0-server: MKDIR info [{frame=337814045}, {MKDIR_path=}, {uuid_utoa=00000000-0000-0000-0000-000000000001}, {bname=}, {client=CTX_ID:3b94ba5f-38d9-4277-9aa4-444ebe65f760-GRAPH_ID:0-PID:22037-HOST:server2-PC_NAME:gvol0-client-1-RECON_NO:-0}, {error-xlator=gvol0-posix}, {errno=22}, {error=Invalid argument}]
[2021-08-25 20:07:12.002820 +0000] E [MSGID: 115056] [server-rpc-fops_v2.c:497:server4_mkdir_cbk] 0-gvol0-server: MKDIR info [{frame=337803516}, {MKDIR_path=}, {uuid_utoa=00000000-0000-0000-0000-000000000001}, {bname=}, {client=CTX_ID:e9b02d95-722b-49d1-b5a3-b3f1eca78ef4-GRAPH_ID:0-PID:3320-HOST:server1-PC_NAME:gvol0-client-1-RECON_NO:-0}, {error-xlator=gvol0-posix}, {errno=22}, {error=Invalid argument}]
[2021-08-25 20:08:54.003409 +0000] E [MSGID: 113002] [posix-entry-ops.c:682:posix_mkdir] 0-gvol0-posix: gfid is null for (null) [Invalid argument]
[2021-08-25 20:08:54.003476 +0000] E [MSGID: 115056] [server-rpc-fops_v2.c:497:server4_mkdir_cbk] 0-gvol0-server: MKDIR info [{frame=337814045}, {MKDIR_path=}, {uuid_utoa=00000000-0000-0000-0000-000000000001}, {bname=}, {client=CTX_ID:3b94ba5f-38d9-4277-9aa4-444ebe65f760-GRAPH_ID:0-PID:22037-HOST:server2-PC_NAME:gvol0-client-1-RECON_NO:-0}, {error-xlator=gvol0-posix}, {errno=22}, {error=Invalid argument}]
When the brick went offline it started logging this instead:
[2021-08-25 20:10:29.894516 +0000] W [dict.c:1532:dict_get_with_ref] (-->/usr/lib/x86_64-linux-gnu/glusterfs/9.0/xlator/storage/posix.so(+0x37721) [0x7ff871924721] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_int32+0x39) [0x7ff87875d059] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_with_ref+0x7d) [0x7ff87875ca7d] ) 0-dict: dict OR key (readdir-filter-directories) is NULL [Invalid argument]
[2021-08-25 20:10:30.346692 +0000] W [dict.c:1532:dict_get_with_ref] (-->/usr/lib/x86_64-linux-gnu/glusterfs/9.0/xlator/storage/posix.so(+0x37721) [0x7ff871924721] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_int32+0x39) [0x7ff87875d059] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_with_ref+0x7d) [0x7ff87875ca7d] ) 0-dict: dict OR key (readdir-filter-directories) is NULL [Invalid argument]
[2021-08-25 20:10:30.346692 +0000] W [dict.c:1532:dict_get_with_ref] (-->/usr/lib/x86_64-linux-gnu/glusterfs/9.0/xlator/storage/posix.so(+0x37721) [0x7ff871924721] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_int32+0x39) [0x7ff87875d059] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_with_ref+0x7d) [0x7ff87875ca7d] ) 0-dict: dict OR key (readdir-filter-directories) is NULL [Invalid argument]
... repeated...
In glustershd.log we have logging as below. Would anyone have a suggestion of what could be wrong? GlusterFS is version 9.0 running on Ubuntu 18.04 (I notice the logging below mentions "Program-name=GlusterFS 4.x v1" which is strange).
Thank you in advance!
[2021-08-25 20:10:25.741098 +0000] W [socket.c:767:__socket_rwv] 0-gvol0-client-0: readv on 192.168.0.201:49152 failed (No data available)
[2021-08-25 20:10:25.741151 +0000] I [MSGID: 114018] [client.c:2229:client_rpc_notify] 0-gvol0-client-0: disconnected from client, process will keep trying to connect glusterd until brick's port is available [{conn-name=gvol0-client-0}]
[2021-08-25 20:10:28.741971 +0000] I [rpc-clnt.c:1968:rpc_clnt_reconfig] 0-gvol0-client-0: changing port to 49153 (from 0)
[2021-08-25 20:10:28.742543 +0000] I [MSGID: 114057] [client-handshake.c:1128:select_server_supported_programs] 0-gvol0-client-0: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}]
[2021-08-25 20:10:28.743059 +0000] I [MSGID: 114046] [client-handshake.c:857:client_setvolume_cbk] 0-gvol0-client-0: Connected, attached to remote volume [{conn-name=gvol0-client-0}, {remote_subvol=/nodirectwritedata/gluster/gvol0}]
[2021-08-25 20:10:28.746963 +0000] I [MSGID: 108026] [afr-self-heal-data.c:347:afr_selfheal_data_do] 0-gvol0-replicate-0: performing data selfheal on 0bd79a48-73e1-4c87-9c75-fc18dc158775
[2021-08-25 20:10:28.752371 +0000] I [MSGID: 108026] [afr-self-heal-common.c:1745:afr_log_selfheal] 0-gvol0-replicate-0: Completed data selfheal on 0bd79a48-73e1-4c87-9c75-fc18dc158775. sources=[1] sinks=0
[2021-08-25 20:10:28.754305 +0000] I [MSGID: 108026] [afr-self-heal-data.c:347:afr_selfheal_data_do] 0-gvol0-replicate-0: performing data selfheal on b8295e57-c99a-4a20-8504-fb7a2e8fb7f2
[2021-08-25 20:10:28.761193 +0000] I [MSGID: 108026] [afr-self-heal-common.c:1745:afr_log_selfheal] 0-gvol0-replicate-0: Completed data selfheal on b8295e57-c99a-4a20-8504-fb7a2e8fb7f2. sources=[1] sinks=0
... repeated many times and then...
[2021-08-25 20:10:44.803924 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gvol0-client-0: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-08-25 20:10:44.803984 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gvol0-client-1: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-08-25 20:20:45.132601 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gvol0-client-0: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
... repeated...
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users