Re: gluster volume status show second node is offline

Dario Lesca <d.lesca@xxxxxxxxxx> · Tue, 07 Sep 2021 19:04:34 +0200

I have setup a similar test environment with two VM on my PC, exactly identical to the one in production .

All work fine.
But when I restart the node 2, all start and work fine, but the volume status of node2 is offline

[root@virt2 ~]# gluster volume status gfsvol1
Status of volume: gfsvol1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick virt1.local:/gfsvol1/brick1           49152     0          Y       8591
Brick virt2.local:/gfsvol1/brick1           N/A       N/A        N       N/A
Self-heal Daemon on localhost               N/A       N/A        Y       970
Self-heal Daemon on virt1.local             N/A       N/A        Y       8608

Task Status of Volume gfsvol1
------------------------------------------------------------------------------
There are no active volume tasks

I have found this solution:
https://bobcares.com/blog/gluster-bring-brick-online/

And when I re-run  "gluster volume start gfsvol1 force" the volume bring back online

[root@virt2 ~]# gluster volume start gfsvol1 force
volume start: gfsvol1: success
[root@virt2 ~]# gluster volume status gfsvol1
Status of volume: gfsvol1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick virt1.local:/gfsvol1/brick1           49152     0          Y       8591
Brick virt2.local:/gfsvol1/brick1           49153     0          Y       1422
Self-heal Daemon on localhost               N/A       N/A        Y       970
Self-heal Daemon on virt1.local             N/A       N/A        Y       8608

Task Status of Volume gfsvol1
------------------------------------------------------------------------------
There are no active volume tasks

But if I reboot the node 2 server, when system is started the volume is already offline

Also if I restart glusterd service on node2 volume bring back online.

systemctl restart glusterd

Seem a serialize startup systemd service problem ... 

Seem glusterd is start Before network is online

I have try modify the systemd unit

- from this 
After=network.target
Before=network-online.target

- to this
After=network.target network-online.target
#Before=network-online.target

and now when I restart node 2 server all work fine and volume is always online

What is wrong?

Let me know and many thanks for help
Dario

Il giorno mar, 07/09/2021 alle 09.46 +0200, Dario Lesca ha scritto:
These are last line into /var/log/glusterfs/bricks/gfsvol1-brick1.log log

[2021-09-06 21:29:02.165238 +0000] I [addr.c:54:compare_addr_and_update] 0-/gfsvol1/brick1: allowed = "*", received addr = "172.16.3.1"
[2021-09-06 21:29:02.165365 +0000] I [login.c:110:gf_auth] 0-auth/login: allowed user names: 12261a60-60a5-4791-a3f1-6da397046ee5
[2021-09-06 21:29:02.165402 +0000] I [MSGID: 115029] [server-handshake.c:561:server_setvolume] 0-gfsvol1-server: accepted client from CTX_ID:444e0582-ac68-4f20-9552-c4dbc7724967-GRAPH_ID:0-PID:227500-HOST:s-virt1.realdomain.it-PC_NAME:gfsvol1-client-1-RECON_NO:-0 (version: 9.3) with subvol /gfsvol1/brick1 
[2021-09-06 21:29:02.179387 +0000] W [socket.c:767:__socket_rwv] 0-tcp.gfsvol1-server: readv on 172.16.3.1:49144 failed (No data available)
[2021-09-06 21:29:02.179451 +0000] I [MSGID: 115036] [server.c:500:server_rpc_notify] 0-gfsvol1-server: disconnecting connection [{client-uid=CTX_ID:444e0582-ac68-4f20-9552-c4dbc7724967-GRAPH_ID:0-PID:227500-HOST:s-virt1.realdomain.it-PC_NAME:gfsvol1-client-1-RECON_NO:-0}] 
[2021-09-06 21:29:02.179877 +0000] I [MSGID: 101055] [client_t.c:397:gf_client_unref] 0-gfsvol1-server: Shutting down connection CTX_ID:444e0582-ac68-4f20-9552-c4dbc7724967-GRAPH_ID:0-PID:227500-HOST:s-virt1.realdomain.it-PC_NAME:gfsvol1-client-1-RECON_NO:-0 
[2021-09-06 21:29:10.254230 +0000] I [addr.c:54:compare_addr_and_update] 0-/gfsvol1/brick1: allowed = "*", received addr = "172.16.3.1"
[2021-09-06 21:29:10.254283 +0000] I [login.c:110:gf_auth] 0-auth/login: allowed user names: 12261a60-60a5-4791-a3f1-6da397046ee5
[2021-09-06 21:29:10.254300 +0000] I [MSGID: 115029] [server-handshake.c:561:server_setvolume] 0-gfsvol1-server: accepted client from CTX_ID:fef710c3-11bf-4a91-b749-f52a536d6dad-GRAPH_ID:0-PID:227541-HOST:s-virt1.realdomain.it-PC_NAME:gfsvol1-client-1-RECON_NO:-0 (version: 9.3) with subvol /gfsvol1/brick1 
[2021-09-06 21:29:10.272069 +0000] W [socket.c:767:__socket_rwv] 0-tcp.gfsvol1-server: readv on 172.16.3.1:49140 failed (No data available)
[2021-09-06 21:29:10.272133 +0000] I [MSGID: 115036] [server.c:500:server_rpc_notify] 0-gfsvol1-server: disconnecting connection [{client-uid=CTX_ID:fef710c3-11bf-4a91-b749-f52a536d6dad-GRAPH_ID:0-PID:227541-HOST:s-virt1.realdomain.it-PC_NAME:gfsvol1-client-1-RECON_NO:-0}] 
[2021-09-06 21:29:10.272430 +0000] I [MSGID: 101055] [client_t.c:397:gf_client_unref] 0-gfsvol1-server: Shutting down connection CTX_ID:fef710c3-11bf-4a91-b749-f52a536d6dad-GRAPH_ID:0-PID:227541-HOST:s-virt1.realdomain.it-PC_NAME:gfsvol1-client-1-RECON_NO:-0 

I have a network adapter reserved and direct connected from the two server with dedicated  IP  172.16.3.1/30 and 172.16.3.2/30, named via /etc/hosts virt1.local and virt2.local

In this logs I see also the real server name ( ... HOST:s-virt1.realdomain.it-PC_NAME: ...) which has another IP on another network.

Now this cluster is in production and support some VM.

What is the bes way to solve this dangerous situation without risk?

Many thanks
Dario

Il giorno mar, 07/09/2021 alle 05.28 +0000, Strahil Nikolov ha scritto:
No, it's not normal.
Go to the virt2 and in /var/log/gluster directory you will find 'bricks' . Check the logs in bricks for more information.

Best Regards,
Strahil Nikolov

On Tue, Sep 7, 2021 at 1:13, Dario Lesca
<d.lesca@xxxxxxxxxx> wrote:
Hello everybody!
I'm a novice with gluster. I have setup my first cluster with two
nodes 

This is the current volume info:

  [root@s-virt1 ~]# gluster volume info gfsvol1
  Volume Name: gfsvol1
  Type: Replicate
  Volume ID: 5bad4a23-58cc-44d7-8195-88409720b941
  Status: Started
  Snapshot Count: 0
  Number of Bricks: 1 x 2 = 2
  Transport-type: tcp
  Bricks:
  Brick1: virt1.local:/gfsvol1/brick1
  Brick2: virt2.local:/gfsvol1/brick1
  Options Reconfigured:
  performance.client-io-threads: off
  nfs.disable: on
  transport.address-family: inet
  storage.fips-mode-rchecksum: on
  cluster.granular-entry-heal: on
  storage.owner-uid: 107
  storage.owner-gid: 107
  server.allow-insecure: on

For now all seem work fine.

I have mount the gfs volume on all two nodes and use the VM into it

But today I noticed that the second node (virt2) is offline:

  [root@s-virt1 ~]# gluster volume status
  Status of volume: gfsvol1
  Gluster process                            TCP Port  RDMA Port  Online  Pid
  ------------------------------------------------------------------------------
  Brick virt1.local:/gfsvol1/brick1          49152    0          Y      3090 
  Brick virt2.local:/gfsvol1/brick1          N/A      N/A        N      N/A  
  Self-heal Daemon on localhost              N/A      N/A        Y      3105 
  Self-heal Daemon on virt2.local            N/A      N/A        Y      3140 

  Task Status of Volume gfsvol1
  ------------------------------------------------------------------------------
  There are no active volume tasks

  [root@s-virt1 ~]# gluster volume status gfsvol1 detail
  Status of volume: gfsvol1
  ------------------------------------------------------------------------------
  Brick                : Brick virt1.local:/gfsvol1/brick1
  TCP Port            : 49152              
  RDMA Port            : 0                  
  Online              : Y                  
  Pid                  : 3090                
  File System          : xfs                
  Device              : /dev/mapper/rl-gfsvol1
  Mount Options        : rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=128,swidth=128,noquota
  Inode Size          : 512                
  Disk Space Free      : 146.4GB            
  Total Disk Space    : 999.9GB            
  Inode Count          : 307030856          
  Free Inodes          : 307026149          
  ------------------------------------------------------------------------------
  Brick                : Brick virt2.local:/gfsvol1/brick1
  TCP Port            : N/A                
  RDMA Port            : N/A                
  Online              : N                  
  Pid                  : N/A                
  File System          : xfs                
  Device              : /dev/mapper/rl-gfsvol1
  Mount Options        : rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=128,swidth=128,noquota
  Inode Size          : 512                
  Disk Space Free      : 146.4GB            
  Total Disk Space    : 999.9GB            
  Inode Count          : 307052016          
  Free Inodes          : 307047307

What does it mean?
What's wrong?
Is this normal or I missing some setting?

If you need more information let me know

Many thanks for your help

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users