Re: volume process does not start - glusterfs is happy with it?

Strahil Nikolov <hunter86_bg@xxxxxxxxx> · Wed, 01 Jul 2020 21:33:27 +0300

Sometimes the brick  comes  up slower than glusterd service (which starts the brick processes).
The problem is that if you leave glusterd to depend on both  bricks and a brick fails (for example  FS problem)  then the other brick will not come up too.

After a  system crash,  the VDO service  replay log was  taking too much and the glusterd  failed (as bricks were not ready yet),  so I just created  an override like this one:

# /etc/systemd/system/glusterd.service.d/01-dependencies.conf
[Unit]
[root@ovirt1 ~]# cat /etc/systemd/system/glusterd.service.d/01-dependencies.conf
[Unit]
Description=GlusterFS, a clustered file-system server
After=network.target rpcbind.service gluster_bricks-engine.mount gluster_bricks-data.mount gluster_bricks-fast1.mount gluster_bricks-fast2.mount gluster_bricks-fast3.mount gluster_bricks-fast4.mount                                          
Before=network-online.target

I have created systemd mount units, due to VDO , but most probably the  local-fs.target will generate the  mount units for you from the fstab.

Best Regards,

Strahil Nikolov

На 1 юли 2020 г. 20:57:22 GMT+03:00, "Felix Kölzow" <felix.koelzow@xxxxxx> написа:
>Hey,
>
>
>what about the device mapper? Everything was mount properly during
>reboot?
>
>It happens to me if the lvm device mapper got a timeout during the
>reboot
>
>process while mounting the brick itself.
>
>
>Regards,
>
>Felix
>
>On 01/07/2020 16:46, lejeczek wrote:
>>
>> On 30/06/2020 11:31, Barak Sason Rofman wrote:
>>> Greetings,
>>>
>>> I'm not sure if that's directly related to your problem,
>>> but on a general level, AFAIK, replica-2 vols are not
>>> recommended due to split brain possibility:
>>>
>https://docs.gluster.org/en/latest/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/
>>>
>>> It's recommended to either use replica-3 or arbiter Arbiter.
>>>
>>> Regards,
>>>
>>> On Tue, Jun 30, 2020 at 1:14 PM lejeczek
>>> <peljasz@xxxxxxxxxxx <mailto:peljasz@xxxxxxxxxxx>> wrote:
>>>
>>>      Hi everybody.
>>>
>>>      I have two peers in the cluster and a 2-replica volume
>>>      which seems okey if it was not for one weird bit -
>>>      when a peer reboots then on that peer after a reboot I
>>>      see:
>>>
>>>      $ gluster volume status USERs
>>>      Status of volume: USERs
>>>      Gluster process                             TCP Port
>>>      RDMA Port  Online  Pid
>>>     
>------------------------------------------------------------------------------
>>>      Brick swir.direct:/00.STORAGE/2/0-GLUSTER-U
>>>      SERs                                        N/A
>>>      N/A        N       N/A
>>>      Brick dzien.direct:/00.STORAGE/2/0-GLUSTER-
>>>      USERs                                       49152
>>>      0          Y       57338
>>>      Self-heal Daemon on localhost               N/A
>>>      N/A        Y       4302
>>>      Self-heal Daemon on dzien.direct            N/A
>>>      N/A        Y       57359
>>>
>>>      Task Status of Volume USERs
>>>     
>------------------------------------------------------------------------------
>>>      There are no active volume tasks
>>>
>>>      I do not suppose it's expected.
>>>      On such rebooted node I see:
>>>      $ systemctl status -l glusterd
>>>      ● glusterd.service - GlusterFS, a clustered
>>>      file-system server
>>>         Loaded: loaded
>>>      (/usr/lib/systemd/system/glusterd.service; enabled;
>>>      vendor preset: enabled)
>>>        Drop-In: /etc/systemd/system/glusterd.service.d
>>>                 └─override.conf
>>>         Active: active (running) since Mon 2020-06-29
>>>      21:37:36 BST; 13h ago
>>>           Docs: man:glusterd(8)
>>>        Process: 4071 ExecStart=/usr/sbin/glusterd -p
>>>      /var/run/glusterd.pid --log-level $LOG_LEVEL
>>>      $GLUSTERD_OPTIONS (code=exited, status>
>>>       Main PID: 4086 (glusterd)
>>>          Tasks: 20 (limit: 101792)
>>>         Memory: 28.9M
>>>         CGroup: /system.slice/glusterd.service
>>>                 ├─4086 /usr/sbin/glusterd -p
>>>      /var/run/glusterd.pid --log-level INFO
>>>                 └─4302 /usr/sbin/glusterfs -s localhost
>>>      --volfile-id shd/USERs -p
>>>      /var/run/gluster/shd/USERs/USERs-shd.pid -l /var/log/g>
>>>
>>>      Jun 29 21:37:36 swir.private.pawel systemd[1]:
>>>      Starting GlusterFS, a clustered file-system server...
>>>      Jun 29 21:37:36 swir.private.pawel systemd[1]: Started
>>>      GlusterFS, a clustered file-system server.
>>>
>>>      And I do not see any other apparent problems nor errors.
>>>      On that node I manually:
>>>      $ systemctl restart glusterd.service
>>>      and...
>>>
>>>      $ gluster volume status USERs
>>>      Status of volume: USERs
>>>      Gluster process                             TCP Port
>>>      RDMA Port  Online  Pid
>>>     
>------------------------------------------------------------------------------
>>>      Brick swir.direct:/00.STORAGE/2/0-GLUSTER-U
>>>      SERs                                        49152
>>>      0          Y       103225
>>>      Brick dzien.direct:/00.STORAGE/2/0-GLUSTER-
>>>      USERs                                       49152
>>>      0          Y       57338
>>>      Self-heal Daemon on localhost               N/A
>>>      N/A        Y       103270
>>>      Self-heal Daemon on dzien.direct            N/A
>>>      N/A        Y       57359
>>>
>>>      Is not a puzzle??? I'm on glusterfs-7.6-1.el8.x86_64
>>>      I hope somebody can share some thoughts.
>>>      many thanks, L.
>>>
>> That cannot be it!? If the root cause of this problem is
>> 2-replica volume then it would be a massive cock-up! Then
>> 2-volume replica should be banned and forbidden.
>>
>> I hope some can suggest a way to troubleshoot it.
>>
>> ps. we all, I presume all, know problems of 2-replica volumes.
>>
>> many thanks, L.
>>
>>
>>>      ________
>>>
>>>
>>>
>>>      Community Meeting Calendar:
>>>
>>>      Schedule -
>>>      Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>      Bridge: https://bluejeans.com/441850968
>>>
>>>      Gluster-users mailing list
>>>      Gluster-users@xxxxxxxxxxx
>>>      <mailto:Gluster-users@xxxxxxxxxxx>
>>>      https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>> --
>>> *Barak Sason Rofman*
>>>
>>> Gluster Storage Development
>>>
>>> Red Hat Israel <https://www.redhat.com/>
>>>
>>> 34 Jerusalem rd. Ra'anana, 43501
>>>
>>> bsasonro@xxxxxxxxxx <mailto:adi@xxxxxxxxxx>
>>>    T: _+972-9-7692304_
>>> M: _+972-52-4326355_
>>>
>>> @RedHat <https://twitter.com/redhat>   Red Hat
>>> <https://www.linkedin.com/company/red-hat>  Red Hat
>>> <https://www.facebook.com/redhat.il/>
>>> <https://red.ht/sig>
>>>
>> ________
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users@xxxxxxxxxxx
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>________
>
>
>
>Community Meeting Calendar:
>
>Schedule -
>Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>Bridge: https://bluejeans.com/441850968
>
>Gluster-users mailing list
>Gluster-users@xxxxxxxxxxx
>https://lists.gluster.org/mailman/listinfo/gluster-users
________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users