I have setup a similar test environment with two VM on my PC, exactly identical to the one in production .
All work fine.
But when I restart the node 2, all start and work fine, but the volume status of node2 is offline
[root@virt2 ~]# gluster volume status gfsvol1
Status of volume: gfsvol1
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick virt1.local:/gfsvol1/brick1 49152 0 Y 8591
Brick virt2.local:/gfsvol1/brick1 N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 970
Self-heal Daemon on virt1.local N/A N/A Y 8608
Task Status of Volume gfsvol1
------------------------------------------------------------------------------
There are no active volume tasks
I have found this solution:
And when I re-run "gluster volume start gfsvol1 force" the volume bring back online
[root@virt2 ~]# gluster volume start gfsvol1 force
volume start: gfsvol1: success
[root@virt2 ~]# gluster volume status gfsvol1
Status of volume: gfsvol1
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick virt1.local:/gfsvol1/brick1 49152 0 Y 8591
Brick virt2.local:/gfsvol1/brick1 49153 0 Y 1422
Self-heal Daemon on localhost N/A N/A Y 970
Self-heal Daemon on virt1.local N/A N/A Y 8608
Task Status of Volume gfsvol1
------------------------------------------------------------------------------
There are no active volume tasks
But if I reboot the node 2 server, when system is started the volume is already offline
Also if I restart glusterd service on node2 volume bring back online.
systemctl restart glusterd
Seem a serialize startup systemd service problem ...
Seem glusterd is start Before network is online
I have try modify the systemd unit
- from this
After=network.target
Before=network-online.target
- to this
After=network.target network-online.target
#Before=network-online.target
and now when I restart node 2 server all work fine and volume is always online
What is wrong?
Let me know and many thanks for help
Dario
Il giorno mar, 07/09/2021 alle 09.46 +0200, Dario Lesca ha scritto:
These are last line into /var/log/glusterfs/bricks/gfsvol1-brick1.log log[2021-09-06 21:29:02.165238 +0000] I [addr.c:54:compare_addr_and_update] 0-/gfsvol1/brick1: allowed = "*", received addr = "172.16.3.1"[2021-09-06 21:29:02.165365 +0000] I [login.c:110:gf_auth] 0-auth/login: allowed user names: 12261a60-60a5-4791-a3f1-6da397046ee5[2021-09-06 21:29:02.165402 +0000] I [MSGID: 115029] [server-handshake.c:561:server_setvolume] 0-gfsvol1-server: accepted client from CTX_ID:444e0582-ac68-4f20-9552-c4dbc7724967-GRAPH_ID:0-PID:227500-HOST:s-virt1.realdomain.it-PC_NAME:gfsvol1-client-1-RECON_NO:-0 (version: 9.3) with subvol /gfsvol1/brick1[2021-09-06 21:29:02.179387 +0000] W [socket.c:767:__socket_rwv] 0-tcp.gfsvol1-server: readv on 172.16.3.1:49144 failed (No data available)[2021-09-06 21:29:02.179451 +0000] I [MSGID: 115036] [server.c:500:server_rpc_notify] 0-gfsvol1-server: disconnecting connection [{client-uid=CTX_ID:444e0582-ac68-4f20-9552-c4dbc7724967-GRAPH_ID:0-PID:227500-HOST:s-virt1.realdomain.it-PC_NAME:gfsvol1-client-1-RECON_NO:-0}][2021-09-06 21:29:02.179877 +0000] I [MSGID: 101055] [client_t.c:397:gf_client_unref] 0-gfsvol1-server: Shutting down connection CTX_ID:444e0582-ac68-4f20-9552-c4dbc7724967-GRAPH_ID:0-PID:227500-HOST:s-virt1.realdomain.it-PC_NAME:gfsvol1-client-1-RECON_NO:-0[2021-09-06 21:29:10.254230 +0000] I [addr.c:54:compare_addr_and_update] 0-/gfsvol1/brick1: allowed = "*", received addr = "172.16.3.1"[2021-09-06 21:29:10.254283 +0000] I [login.c:110:gf_auth] 0-auth/login: allowed user names: 12261a60-60a5-4791-a3f1-6da397046ee5[2021-09-06 21:29:10.254300 +0000] I [MSGID: 115029] [server-handshake.c:561:server_setvolume] 0-gfsvol1-server: accepted client from CTX_ID:fef710c3-11bf-4a91-b749-f52a536d6dad-GRAPH_ID:0-PID:227541-HOST:s-virt1.realdomain.it-PC_NAME:gfsvol1-client-1-RECON_NO:-0 (version: 9.3) with subvol /gfsvol1/brick1[2021-09-06 21:29:10.272069 +0000] W [socket.c:767:__socket_rwv] 0-tcp.gfsvol1-server: readv on 172.16.3.1:49140 failed (No data available)[2021-09-06 21:29:10.272133 +0000] I [MSGID: 115036] [server.c:500:server_rpc_notify] 0-gfsvol1-server: disconnecting connection [{client-uid=CTX_ID:fef710c3-11bf-4a91-b749-f52a536d6dad-GRAPH_ID:0-PID:227541-HOST:s-virt1.realdomain.it-PC_NAME:gfsvol1-client-1-RECON_NO:-0}][2021-09-06 21:29:10.272430 +0000] I [MSGID: 101055] [client_t.c:397:gf_client_unref] 0-gfsvol1-server: Shutting down connection CTX_ID:fef710c3-11bf-4a91-b749-f52a536d6dad-GRAPH_ID:0-PID:227541-HOST:s-virt1.realdomain.it-PC_NAME:gfsvol1-client-1-RECON_NO:-0I have a network adapter reserved and direct connected from the two server with dedicated IP 172.16.3.1/30 and 172.16.3.2/30, named via /etc/hosts virt1.local and virt2.localIn this logs I see also the real server name ( ... HOST:s-virt1.realdomain.it-PC_NAME: ...) which has another IP on another network.Now this cluster is in production and support some VM.What is the bes way to solve this dangerous situation without risk?Many thanksDarioIl giorno mar, 07/09/2021 alle 05.28 +0000, Strahil Nikolov ha scritto:No, it's not normal.Go to the virt2 and in /var/log/gluster directory you will find 'bricks' . Check the logs in bricks for more information.Best Regards,Strahil Nikolov
On Tue, Sep 7, 2021 at 1:13, Dario Lesca<d.lesca@xxxxxxxxxx> wrote:Hello everybody!I'm a novice with gluster. I have setup my first cluster with twonodesThis is the current volume info:[root@s-virt1 ~]# gluster volume info gfsvol1Volume Name: gfsvol1Type: ReplicateVolume ID: 5bad4a23-58cc-44d7-8195-88409720b941Status: StartedSnapshot Count: 0Number of Bricks: 1 x 2 = 2Transport-type: tcpBricks:Brick1: virt1.local:/gfsvol1/brick1Brick2: virt2.local:/gfsvol1/brick1Options Reconfigured:performance.client-io-threads: offnfs.disable: ontransport.address-family: inetstorage.fips-mode-rchecksum: oncluster.granular-entry-heal: onstorage.owner-uid: 107storage.owner-gid: 107server.allow-insecure: onFor now all seem work fine.I have mount the gfs volume on all two nodes and use the VM into itBut today I noticed that the second node (virt2) is offline:[root@s-virt1 ~]# gluster volume statusStatus of volume: gfsvol1Gluster process TCP Port RDMA Port Online Pid------------------------------------------------------------------------------Brick virt1.local:/gfsvol1/brick1 49152 0 Y 3090Brick virt2.local:/gfsvol1/brick1 N/A N/A N N/ASelf-heal Daemon on localhost N/A N/A Y 3105Self-heal Daemon on virt2.local N/A N/A Y 3140Task Status of Volume gfsvol1------------------------------------------------------------------------------There are no active volume tasks[root@s-virt1 ~]# gluster volume status gfsvol1 detailStatus of volume: gfsvol1------------------------------------------------------------------------------Brick : Brick virt1.local:/gfsvol1/brick1TCP Port : 49152RDMA Port : 0Online : YPid : 3090File System : xfsDevice : /dev/mapper/rl-gfsvol1Mount Options : rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=128,swidth=128,noquotaInode Size : 512Disk Space Free : 146.4GBTotal Disk Space : 999.9GBInode Count : 307030856Free Inodes : 307026149------------------------------------------------------------------------------Brick : Brick virt2.local:/gfsvol1/brick1TCP Port : N/ARDMA Port : N/AOnline : NPid : N/AFile System : xfsDevice : /dev/mapper/rl-gfsvol1Mount Options : rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=128,swidth=128,noquotaInode Size : 512Disk Space Free : 146.4GBTotal Disk Space : 999.9GBInode Count : 307052016Free Inodes : 307047307What does it mean?What's wrong?Is this normal or I missing some setting?If you need more information let me knowMany thanks for your help________Community Meeting Calendar:Schedule -Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTCGluster-users mailing list
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users