Hi Strahil,
Thanks for sharing the steps. I have tried all the steps mentioned except 6 and 9.
Let me try them as well and see how it is responding.
Thanks,
Ahemad
On Wednesday, 17 June, 2020, 02:51:48 pm IST, Strahil Nikolov <hunter86_bg@xxxxxxxxx> wrote:
Hi Ahemad,
most probably the reason of the unexpected downtime lies somewhere else and you just observe symptoms.
So, you have replica 3 volume on 3 separate hosts , right ?
Here is what I think you should do on a TEST cluster (could be VMs even kn your laptop):
1. Create 1 brick on each VM
2. Create the TSP
3. Create the replica 3 volume
4. Enable & start glusterfsd.service on all VMs
5. Connect another VM via fuse and use the mount like this one:
mount -t glusterfs -o backup-volfile-servers=vm2:vm3 vm1:/volume1 /mnt
6. Now test hardware failure - power off VM1 ungracefully. The fuse client should recover in less than a minute - this is defined by the volume timeout
7. Power up vm1 and check the heal status
8. Once the heal is over you can proceed
9. Test planned maintenance - use the gluster script to kill all gluster processes on vm2. FUSE client should not hang and should not notice anything.
10. Start glusterd.service and then forcefully start the brick:
gluster volume start volume1 force
Check status:
gluster volume status golume1
Wait for the heals to complete.
All bricks should be online.
11. Now shutdown vm3 gracefully. The glusterfsd.service should kill all gluster processses and the FUSE client should never experience any issues.
The only case where you can observe partial downtime with replica 3 is when there were pending heals and one of the 2 good sources has failed/powerdown before the heal has completed.
Usually there are 2 healing mechanisms:
1) When a FUSE client access a file that has some differences (2 bricks same file, 1 brick older version), it will try to correct the issues.
2) There is a daemon that crawls over the pending heals every 15min (I'm not sure, maybe even 10) and heals any 'blame's.
You can use 'gluster golume heal volume1 full' to initiate a full heal, but on large bricks ot could take a long time and is usually used after brick replacement/reset.
Best Regards,
Strahil Nikolov
На 17 юни 2020 г. 10:45:27 GMT+03:00, ahemad shaik <ahemad_shaik@xxxxxxxxx> написа:
> Thanks Karthik for the information. Let me try.
>Thanks,Ahemad
>On Wednesday, 17 June, 2020, 12:43:29 pm IST, Karthik Subrahmanya
><ksubrahm@xxxxxxxxxx> wrote:
>
> Hi Ahemad,
>Glad to hear that your problem is resolved. Thanks Strahil and Hubert
>for your suggestions.
>
>On Wed, Jun 17, 2020 at 12:29 PM ahemad shaik <ahemad_shaik@xxxxxxxxx>
>wrote:
>
> Hi
>I tried starting and enabling the glusterfsd service suggested by
>Hubert and Strahil, I see that works when one of the gluster volume is
>not available and client still able to access the mount point.
>Thanks so much Strahil , Hubert and Karthik on your suggestion and for
>the time.
>can you please help on making data consistent in all nodes when we have
>some 5 hours of down time and one of the server . how to achieve data
>consistency in all 3 nodes.
>When the node/brick which was down comes back up, gluster self heal
>daemon (glustershd) will automatically do the syncing of the data to
>the down brick and make it consistent with the good copies. You can
>alternatively run the index heal command "gluster volume heal
><vol-name>" to trigger the heal manually and you can see the entries
>needing heal and the progress of heal by running "gluster volume heal
><vol-name> info".
>HTH,Karthik
>
>Any documentation on that end will be helpful.
>Thanks,Ahemad
>On Wednesday, 17 June, 2020, 12:03:06 pm IST, Karthik Subrahmanya
><ksubrahm@xxxxxxxxxx> wrote:
>
> Hi Ahemad,
>Sorry for a lot of back and forth on this. But we might need a few more
>details to find the actual cause here.What version of gluster you are
>running on server and client nodes?Also provide the statedump [1] of
>the bricks and the client process when the hang is seen.
>[1] https://docs.gluster.org/en/latest/Troubleshooting/statedump/
>Regards,Karthik
>On Wed, Jun 17, 2020 at 9:25 AM ahemad_shaik@xxxxxxxxx
><ahemad_shaik@xxxxxxxxx> wrote:
>
>I have a 3 replica gluster volume created in 3 nodes and when one node
>is down due to some issue and the clients not able access volume. This
>was the issue. I have fixed the server and it is back. There was
>downtime at client. I just want to avoid the downtime since it is 3
>replica.
>I am testing the high availability now by making one of the brick
>server rebooting or shut down manually. I just want to make volume
>accessible always by client. That is the reason we went for replica
>volume.
>So I just would like to know how to make the client volume high
>available even some VM or node which is having gluster volume goes down
>unexpectedly had down time of 10 hours.
>
>
>Glusterfsd service is used to stop which is disabled in my cluster and
>I see one more service running gluserd.
>Will starting glusterfsd service in all 3 replica nodes will help in
>achieving what I am trying.
>Hope I am clear.
>Thanks,Ahemad
>
>
>Thanks,Ahemad
>
>
>
>On Tue, Jun 16, 2020 at 23:12, Strahil Nikolov<hunter86_bg@xxxxxxxxx>
>wrote: In my cluster , the service is enabled and running.
>
>What actually is your problem ?
>When a gluster brick process dies unexpectedly - all fuse clients will
>be waiting for the timeout .
>The service glusterfsd is ensuring that during system shutdown , the
>brick procesees will be shutdown in such way that all native clients
>won't 'hang' and wait for the timeout, but will directly choose
>another brick.
>
>The same happens when you manually run the kill script - all gluster
>processes shutdown and all clients are redirected to another brick.
>
>Keep in mind that fuse mounts will also be killed both by the script
>and the glusterfsd service.
>
>Best Regards,
>Strahil Nikolov
>
>На 16 юни 2020 г. 19:48:32 GMT+03:00, ahemad shaik
><ahemad_shaik@xxxxxxxxx> написа:
>> Hi Strahil,
>>I have the gluster setup on centos 7 cluster.I see glusterfsd service
>>and it is in inactive state.
>>systemctl status glusterfsd.service● glusterfsd.service - GlusterFS
>>brick processes (stopping only) Loaded: loaded
>>(/usr/lib/systemd/system/glusterfsd.service; disabled; vendor preset:
>>disabled) Active: inactive (dead)
>>
>>so you mean starting this service in all the nodes where gluster
>>volumes are created, will solve the issue ?
>>
>>Thanks,Ahemad
>>
>>
>>On Tuesday, 16 June, 2020, 10:12:22 pm IST, Strahil Nikolov
>><hunter86_bg@xxxxxxxxx> wrote:
>>
>> Hi ahemad,
>>
>>the script kills all gluster processes, so the clients won't
>wait
>>for the timeout before switching to another node in the TSP.
>>
>>In CentOS/RHEL, there is a systemd service called
>>'glusterfsd.service' that is taking care on shutdown to kill all
>>processes, so clients won't hung.
>>
>>systemctl cat glusterfsd.service --no-pager
>># /usr/lib/systemd/system/glusterfsd.service
>>[Unit]
>>Description=GlusterFS brick processes (stopping only)
>>After=network.target glusterd.service
>>
>>[Service]
>>Type=oneshot
>># glusterd starts the glusterfsd processed on-demand
>># /bin/true will mark this service as started, RemainAfterExit keeps
>it
>>active
>>ExecStart=/bin/true
>>RemainAfterExit=yes
>># if there are no glusterfsd processes, a stop/reload should not give
>>an error
>>ExecStop=/bin/sh -c "/bin/killall --wait glusterfsd || /bin/true"
>>ExecReload=/bin/sh -c "/bin/killall -HUP glusterfsd || /bin/true"
>>
>>[Install]
>>WantedBy=multi-user.target
>>
>>Best Regards,
>>Strahil Nikolov
>>
>>На 16 юни 2020 г. 18:41:59 GMT+03:00, ahemad shaik
>><ahemad_shaik@xxxxxxxxx> написа:
>>> Hi,
>>>I see there is a script file in below mentioned path in all nodes
>>using
>>>which gluster volume
>>>created./usr/share/glusterfs/scripts/stop-all-gluster-processes.sh
>>>I need to create a system service and when ever there is some server
>>>down, we need to call this script or we need to have it run always it
>>>will take care when some node is down to make sure that client will
>>not
>>>have any issues in accessing mount point ?
>>>can you please share any documentation on how to use this.That will
>be
>>>great help.
>>>Thanks,Ahemad
>>>
>>>
>>>
>>>
>>>On Tuesday, 16 June, 2020, 08:59:31 pm IST, Strahil Nikolov
>>><hunter86_bg@xxxxxxxxx> wrote:
>>>
>>> Hi Ahemad,
>>>
>>>You can simplify it by creating a systemd service that will call
>>>the script.
>>>
>>>It was already mentioned in a previous thread (with example), so
>>>you can just use it.
>>>
>>>Best Regards,
>>>Strahil Nikolov
>>>
>>>На 16 юни 2020 г. 16:02:07 GMT+03:00, Hu Bert
><revirii@xxxxxxxxxxxxxx>
>>>написа:
>>>>Hi,
>>>>
>>>>if you simply reboot or shutdown one of the gluster nodes, there
>>might
>>>>be a (short or medium) unavailability of the volume on the nodes. To
>>>>avoid this there's script:
>>>>
>>>>/usr/share/glusterfs/scripts/stop-all-gluster-processes.sh (path may
>>>>be different depending on distribution)
>>>>
>>>>If i remember correctly: this notifies the clients that this node is
>>>>going to be unavailable (please correct me if the details are
>wrong).
>>>>If i do reboots of one gluster node, i always call this script and
>>>>never have seen unavailability issues on the clients.
>>>>
>>>>
>>>>Regards,
>>>>Hubert
>>>>
>>>>Am Mo., 15. Juni 2020 um 19:36 Uhr schrieb ahemad shaik
>>>><ahemad_shaik@xxxxxxxxx>:
>>>>>
>>>>> Hi There,
>>>>>
>>>>> I have created 3 replica gluster volume with 3 bricks from 3
>nodes.
>>>>>
>>>>> "gluster volume create glustervol replica 3 transport tcp
>>>node1:/data
>>>>node2:/data node3:/data force"
>>>>>
>>>>> mounted on client node using below command.
>>>>>
>>>>> "mount -t glusterfs node4:/glustervol /mnt/"
>>>>>
>>>>> when any of the node (either node1,node2 or node3) goes down,
>>>gluster
>>>>mount/volume (/mnt) not accessible at client (node4).
>>>>>
>>>>> purpose of replicated volume is high availability but not able to
>>>>achieve it.
>>>>>
>>>>> Is it a bug or i am missing anything.
>>>>>
>>>>>
>>>>> Any suggestions will be great help!!!
>>>>>
>>>>> kindly suggest.
>>>>>
>>>>> Thanks,
>>>>> Ahemad
>>>>>
>>>>> ________
>>>>>
>>>>>
>>>>>
>>>>> Community Meeting Calendar:
>>>>>
>>>>> Schedule -
>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>>> Bridge: https://bluejeans.com/441850968
>>>>>
>>>>> Gluster-users mailing list
>>>>> Gluster-users@xxxxxxxxxxx
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>________
>>>>
>>>>
>>>>
>>>>Community Meeting Calendar:
>>>>
>>>>Schedule -
>>>>Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>>Bridge: https://bluejeans.com/441850968
>>>>
>>>>Gluster-users mailing list
>>>>Gluster-users@xxxxxxxxxxx
>>>>https://lists.gluster.org/mailman/listinfo/gluster-users
>________
>
>
>
>Community Meeting Calendar:
>
>Schedule -
>Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>Bridge: https://bluejeans.com/441850968
>
>Gluster-users mailing list
>Gluster-users@xxxxxxxxxxx
>https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
most probably the reason of the unexpected downtime lies somewhere else and you just observe symptoms.
So, you have replica 3 volume on 3 separate hosts , right ?
Here is what I think you should do on a TEST cluster (could be VMs even kn your laptop):
1. Create 1 brick on each VM
2. Create the TSP
3. Create the replica 3 volume
4. Enable & start glusterfsd.service on all VMs
5. Connect another VM via fuse and use the mount like this one:
mount -t glusterfs -o backup-volfile-servers=vm2:vm3 vm1:/volume1 /mnt
6. Now test hardware failure - power off VM1 ungracefully. The fuse client should recover in less than a minute - this is defined by the volume timeout
7. Power up vm1 and check the heal status
8. Once the heal is over you can proceed
9. Test planned maintenance - use the gluster script to kill all gluster processes on vm2. FUSE client should not hang and should not notice anything.
10. Start glusterd.service and then forcefully start the brick:
gluster volume start volume1 force
Check status:
gluster volume status golume1
Wait for the heals to complete.
All bricks should be online.
11. Now shutdown vm3 gracefully. The glusterfsd.service should kill all gluster processses and the FUSE client should never experience any issues.
The only case where you can observe partial downtime with replica 3 is when there were pending heals and one of the 2 good sources has failed/powerdown before the heal has completed.
Usually there are 2 healing mechanisms:
1) When a FUSE client access a file that has some differences (2 bricks same file, 1 brick older version), it will try to correct the issues.
2) There is a daemon that crawls over the pending heals every 15min (I'm not sure, maybe even 10) and heals any 'blame's.
You can use 'gluster golume heal volume1 full' to initiate a full heal, but on large bricks ot could take a long time and is usually used after brick replacement/reset.
Best Regards,
Strahil Nikolov
На 17 юни 2020 г. 10:45:27 GMT+03:00, ahemad shaik <ahemad_shaik@xxxxxxxxx> написа:
> Thanks Karthik for the information. Let me try.
>Thanks,Ahemad
>On Wednesday, 17 June, 2020, 12:43:29 pm IST, Karthik Subrahmanya
><ksubrahm@xxxxxxxxxx> wrote:
>
> Hi Ahemad,
>Glad to hear that your problem is resolved. Thanks Strahil and Hubert
>for your suggestions.
>
>On Wed, Jun 17, 2020 at 12:29 PM ahemad shaik <ahemad_shaik@xxxxxxxxx>
>wrote:
>
> Hi
>I tried starting and enabling the glusterfsd service suggested by
>Hubert and Strahil, I see that works when one of the gluster volume is
>not available and client still able to access the mount point.
>Thanks so much Strahil , Hubert and Karthik on your suggestion and for
>the time.
>can you please help on making data consistent in all nodes when we have
>some 5 hours of down time and one of the server . how to achieve data
>consistency in all 3 nodes.
>When the node/brick which was down comes back up, gluster self heal
>daemon (glustershd) will automatically do the syncing of the data to
>the down brick and make it consistent with the good copies. You can
>alternatively run the index heal command "gluster volume heal
><vol-name>" to trigger the heal manually and you can see the entries
>needing heal and the progress of heal by running "gluster volume heal
><vol-name> info".
>HTH,Karthik
>
>Any documentation on that end will be helpful.
>Thanks,Ahemad
>On Wednesday, 17 June, 2020, 12:03:06 pm IST, Karthik Subrahmanya
><ksubrahm@xxxxxxxxxx> wrote:
>
> Hi Ahemad,
>Sorry for a lot of back and forth on this. But we might need a few more
>details to find the actual cause here.What version of gluster you are
>running on server and client nodes?Also provide the statedump [1] of
>the bricks and the client process when the hang is seen.
>[1] https://docs.gluster.org/en/latest/Troubleshooting/statedump/
>Regards,Karthik
>On Wed, Jun 17, 2020 at 9:25 AM ahemad_shaik@xxxxxxxxx
><ahemad_shaik@xxxxxxxxx> wrote:
>
>I have a 3 replica gluster volume created in 3 nodes and when one node
>is down due to some issue and the clients not able access volume. This
>was the issue. I have fixed the server and it is back. There was
>downtime at client. I just want to avoid the downtime since it is 3
>replica.
>I am testing the high availability now by making one of the brick
>server rebooting or shut down manually. I just want to make volume
>accessible always by client. That is the reason we went for replica
>volume.
>So I just would like to know how to make the client volume high
>available even some VM or node which is having gluster volume goes down
>unexpectedly had down time of 10 hours.
>
>
>Glusterfsd service is used to stop which is disabled in my cluster and
>I see one more service running gluserd.
>Will starting glusterfsd service in all 3 replica nodes will help in
>achieving what I am trying.
>Hope I am clear.
>Thanks,Ahemad
>
>
>Thanks,Ahemad
>
>
>
>On Tue, Jun 16, 2020 at 23:12, Strahil Nikolov<hunter86_bg@xxxxxxxxx>
>wrote: In my cluster , the service is enabled and running.
>
>What actually is your problem ?
>When a gluster brick process dies unexpectedly - all fuse clients will
>be waiting for the timeout .
>The service glusterfsd is ensuring that during system shutdown , the
>brick procesees will be shutdown in such way that all native clients
>won't 'hang' and wait for the timeout, but will directly choose
>another brick.
>
>The same happens when you manually run the kill script - all gluster
>processes shutdown and all clients are redirected to another brick.
>
>Keep in mind that fuse mounts will also be killed both by the script
>and the glusterfsd service.
>
>Best Regards,
>Strahil Nikolov
>
>На 16 юни 2020 г. 19:48:32 GMT+03:00, ahemad shaik
><ahemad_shaik@xxxxxxxxx> написа:
>> Hi Strahil,
>>I have the gluster setup on centos 7 cluster.I see glusterfsd service
>>and it is in inactive state.
>>systemctl status glusterfsd.service● glusterfsd.service - GlusterFS
>>brick processes (stopping only) Loaded: loaded
>>(/usr/lib/systemd/system/glusterfsd.service; disabled; vendor preset:
>>disabled) Active: inactive (dead)
>>
>>so you mean starting this service in all the nodes where gluster
>>volumes are created, will solve the issue ?
>>
>>Thanks,Ahemad
>>
>>
>>On Tuesday, 16 June, 2020, 10:12:22 pm IST, Strahil Nikolov
>><hunter86_bg@xxxxxxxxx> wrote:
>>
>> Hi ahemad,
>>
>>the script kills all gluster processes, so the clients won't
>wait
>>for the timeout before switching to another node in the TSP.
>>
>>In CentOS/RHEL, there is a systemd service called
>>'glusterfsd.service' that is taking care on shutdown to kill all
>>processes, so clients won't hung.
>>
>>systemctl cat glusterfsd.service --no-pager
>># /usr/lib/systemd/system/glusterfsd.service
>>[Unit]
>>Description=GlusterFS brick processes (stopping only)
>>After=network.target glusterd.service
>>
>>[Service]
>>Type=oneshot
>># glusterd starts the glusterfsd processed on-demand
>># /bin/true will mark this service as started, RemainAfterExit keeps
>it
>>active
>>ExecStart=/bin/true
>>RemainAfterExit=yes
>># if there are no glusterfsd processes, a stop/reload should not give
>>an error
>>ExecStop=/bin/sh -c "/bin/killall --wait glusterfsd || /bin/true"
>>ExecReload=/bin/sh -c "/bin/killall -HUP glusterfsd || /bin/true"
>>
>>[Install]
>>WantedBy=multi-user.target
>>
>>Best Regards,
>>Strahil Nikolov
>>
>>На 16 юни 2020 г. 18:41:59 GMT+03:00, ahemad shaik
>><ahemad_shaik@xxxxxxxxx> написа:
>>> Hi,
>>>I see there is a script file in below mentioned path in all nodes
>>using
>>>which gluster volume
>>>created./usr/share/glusterfs/scripts/stop-all-gluster-processes.sh
>>>I need to create a system service and when ever there is some server
>>>down, we need to call this script or we need to have it run always it
>>>will take care when some node is down to make sure that client will
>>not
>>>have any issues in accessing mount point ?
>>>can you please share any documentation on how to use this.That will
>be
>>>great help.
>>>Thanks,Ahemad
>>>
>>>
>>>
>>>
>>>On Tuesday, 16 June, 2020, 08:59:31 pm IST, Strahil Nikolov
>>><hunter86_bg@xxxxxxxxx> wrote:
>>>
>>> Hi Ahemad,
>>>
>>>You can simplify it by creating a systemd service that will call
>>>the script.
>>>
>>>It was already mentioned in a previous thread (with example), so
>>>you can just use it.
>>>
>>>Best Regards,
>>>Strahil Nikolov
>>>
>>>На 16 юни 2020 г. 16:02:07 GMT+03:00, Hu Bert
><revirii@xxxxxxxxxxxxxx>
>>>написа:
>>>>Hi,
>>>>
>>>>if you simply reboot or shutdown one of the gluster nodes, there
>>might
>>>>be a (short or medium) unavailability of the volume on the nodes. To
>>>>avoid this there's script:
>>>>
>>>>/usr/share/glusterfs/scripts/stop-all-gluster-processes.sh (path may
>>>>be different depending on distribution)
>>>>
>>>>If i remember correctly: this notifies the clients that this node is
>>>>going to be unavailable (please correct me if the details are
>wrong).
>>>>If i do reboots of one gluster node, i always call this script and
>>>>never have seen unavailability issues on the clients.
>>>>
>>>>
>>>>Regards,
>>>>Hubert
>>>>
>>>>Am Mo., 15. Juni 2020 um 19:36 Uhr schrieb ahemad shaik
>>>><ahemad_shaik@xxxxxxxxx>:
>>>>>
>>>>> Hi There,
>>>>>
>>>>> I have created 3 replica gluster volume with 3 bricks from 3
>nodes.
>>>>>
>>>>> "gluster volume create glustervol replica 3 transport tcp
>>>node1:/data
>>>>node2:/data node3:/data force"
>>>>>
>>>>> mounted on client node using below command.
>>>>>
>>>>> "mount -t glusterfs node4:/glustervol /mnt/"
>>>>>
>>>>> when any of the node (either node1,node2 or node3) goes down,
>>>gluster
>>>>mount/volume (/mnt) not accessible at client (node4).
>>>>>
>>>>> purpose of replicated volume is high availability but not able to
>>>>achieve it.
>>>>>
>>>>> Is it a bug or i am missing anything.
>>>>>
>>>>>
>>>>> Any suggestions will be great help!!!
>>>>>
>>>>> kindly suggest.
>>>>>
>>>>> Thanks,
>>>>> Ahemad
>>>>>
>>>>> ________
>>>>>
>>>>>
>>>>>
>>>>> Community Meeting Calendar:
>>>>>
>>>>> Schedule -
>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>>> Bridge: https://bluejeans.com/441850968
>>>>>
>>>>> Gluster-users mailing list
>>>>> Gluster-users@xxxxxxxxxxx
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>________
>>>>
>>>>
>>>>
>>>>Community Meeting Calendar:
>>>>
>>>>Schedule -
>>>>Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>>Bridge: https://bluejeans.com/441850968
>>>>
>>>>Gluster-users mailing list
>>>>Gluster-users@xxxxxxxxxxx
>>>>https://lists.gluster.org/mailman/listinfo/gluster-users
>________
>
>
>
>Community Meeting Calendar:
>
>Schedule -
>Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>Bridge: https://bluejeans.com/441850968
>
>Gluster-users mailing list
>Gluster-users@xxxxxxxxxxx
>https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users