Re: Brick offline problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

The same has happened again, with the following broadcast. Would anyone have any ideas what's going on? Thanks in advance.

Aug 26 22:12:12 server1 nodirectwritedata-gluster-gvol0[7068]: [2021-08-27 02:12:12.120501 +0000] M [MSGID: 113075] [posix-helpers.c:2211:posix_health_check_thread_proc] 0-gvol0-posix: health-check failed, going down
Aug 26 22:12:12 server1 nodirectwritedata-gluster-gvol0[7068]: [2021-08-27 02:12:12.120597 +0000] M [MSGID: 113075] [posix-helpers.c:2229:posix_health_check_thread_proc] 0-gvol0-posix: still alive! -> SIGTERM


On Thu, 26 Aug 2021 at 15:16, David Cunningham <dcunningham@xxxxxxxxxxxxx> wrote:
Hello,

We have a 2 node mirrored GlusterFS cluster, and one of the bricks (server1) has recently gone offline:

Status of volume: gvol0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick server2:/nodirectwritedata/gluster/gvol0                                     49153     0          Y       22015
Brick server1:/nodirectwritedata/gluster/gvol0                                     N/A       N/A        N       N/A  
Self-heal Daemon on localhost               N/A       N/A        Y       22037
Self-heal Daemon on server1             N/A       N/A        Y       3320

This happened during the day with no action on our part to cause it. However, glusterfsd is still running on server1. In nodirectwritedata-gluster-gvol0.log we see lines like this before the brick went offline:

... same as following lines back to the start of the log file...
[2021-08-25 20:07:12.002764 +0000] E [MSGID: 113002] [posix-entry-ops.c:682:posix_mkdir] 0-gvol0-posix: gfid is null for (null) [Invalid argument]
[2021-08-25 20:07:12.002820 +0000] E [MSGID: 115056] [server-rpc-fops_v2.c:497:server4_mkdir_cbk] 0-gvol0-server: MKDIR info [{frame=337803516}, {MKDIR_path=}, {uuid_utoa=00000000-0000-0000-0000-000000000001}, {bname=}, {client=CTX_ID:e9b02d95-722b-49d1-b5a3-b3f1eca78ef4-GRAPH_ID:0-PID:3320-HOST:server1-PC_NAME:gvol0-client-1-RECON_NO:-0}, {error-xlator=gvol0-posix}, {errno=22}, {error=Invalid argument}]
[2021-08-25 20:08:54.003409 +0000] E [MSGID: 113002] [posix-entry-ops.c:682:posix_mkdir] 0-gvol0-posix: gfid is null for (null) [Invalid argument]
[2021-08-25 20:08:54.003476 +0000] E [MSGID: 115056] [server-rpc-fops_v2.c:497:server4_mkdir_cbk] 0-gvol0-server: MKDIR info [{frame=337814045}, {MKDIR_path=}, {uuid_utoa=00000000-0000-0000-0000-000000000001}, {bname=}, {client=CTX_ID:3b94ba5f-38d9-4277-9aa4-444ebe65f760-GRAPH_ID:0-PID:22037-HOST:server2-PC_NAME:gvol0-client-1-RECON_NO:-0}, {error-xlator=gvol0-posix}, {errno=22}, {error=Invalid argument}]

When the brick went offline it started logging this instead:

[2021-08-25 20:10:29.894516 +0000] W [dict.c:1532:dict_get_with_ref] (-->/usr/lib/x86_64-linux-gnu/glusterfs/9.0/xlator/storage/posix.so(+0x37721) [0x7ff871924721] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_int32+0x39) [0x7ff87875d059] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_with_ref+0x7d) [0x7ff87875ca7d] ) 0-dict: dict OR key (readdir-filter-directories) is NULL [Invalid argument]
[2021-08-25 20:10:30.346692 +0000] W [dict.c:1532:dict_get_with_ref] (-->/usr/lib/x86_64-linux-gnu/glusterfs/9.0/xlator/storage/posix.so(+0x37721) [0x7ff871924721] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_int32+0x39) [0x7ff87875d059] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_with_ref+0x7d) [0x7ff87875ca7d] ) 0-dict: dict OR key (readdir-filter-directories) is NULL [Invalid argument]
... repeated...

In glustershd.log we have logging as below. Would anyone have a suggestion of what could be wrong? GlusterFS is version 9.0 running on Ubuntu 18.04 (I notice the logging below mentions "Program-name=GlusterFS 4.x v1" which is strange).
Thank you in advance!

[2021-08-25 20:10:25.741098 +0000] W [socket.c:767:__socket_rwv] 0-gvol0-client-0: readv on 192.168.0.201:49152 failed (No data available)
[2021-08-25 20:10:25.741151 +0000] I [MSGID: 114018] [client.c:2229:client_rpc_notify] 0-gvol0-client-0: disconnected from client, process will keep trying to connect glusterd until brick's port is available [{conn-name=gvol0-client-0}]
[2021-08-25 20:10:28.741971 +0000] I [rpc-clnt.c:1968:rpc_clnt_reconfig] 0-gvol0-client-0: changing port to 49153 (from 0)
[2021-08-25 20:10:28.742543 +0000] I [MSGID: 114057] [client-handshake.c:1128:select_server_supported_programs] 0-gvol0-client-0: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}]
[2021-08-25 20:10:28.743059 +0000] I [MSGID: 114046] [client-handshake.c:857:client_setvolume_cbk] 0-gvol0-client-0: Connected, attached to remote volume [{conn-name=gvol0-client-0}, {remote_subvol=/nodirectwritedata/gluster/gvol0}]
[2021-08-25 20:10:28.746963 +0000] I [MSGID: 108026] [afr-self-heal-data.c:347:afr_selfheal_data_do] 0-gvol0-replicate-0: performing data selfheal on 0bd79a48-73e1-4c87-9c75-fc18dc158775
[2021-08-25 20:10:28.752371 +0000] I [MSGID: 108026] [afr-self-heal-common.c:1745:afr_log_selfheal] 0-gvol0-replicate-0: Completed data selfheal on 0bd79a48-73e1-4c87-9c75-fc18dc158775. sources=[1]  sinks=0  
[2021-08-25 20:10:28.754305 +0000] I [MSGID: 108026] [afr-self-heal-data.c:347:afr_selfheal_data_do] 0-gvol0-replicate-0: performing data selfheal on b8295e57-c99a-4a20-8504-fb7a2e8fb7f2
[2021-08-25 20:10:28.761193 +0000] I [MSGID: 108026] [afr-self-heal-common.c:1745:afr_log_selfheal] 0-gvol0-replicate-0: Completed data selfheal on b8295e57-c99a-4a20-8504-fb7a2e8fb7f2. sources=[1]  sinks=0
... repeated many times and then...
 [2021-08-25 20:10:44.803924 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gvol0-client-0: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-08-25 20:10:44.803984 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gvol0-client-1: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-08-25 20:20:45.132601 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gvol0-client-0: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
... repeated...

--
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782


--
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux