Re: volume process does not start - glusterfs is happy with it?

Felix Kölzow <felix.koelzow@xxxxxx> · Wed, 1 Jul 2020 19:57:22 +0200

Hey,

what about the device mapper? Everything was mount properly during reboot?

It happens to me if the lvm device mapper got a timeout during the reboot

process while mounting the brick itself.

Regards,

Felix

On 01/07/2020 16:46, lejeczek wrote:

On 30/06/2020 11:31, Barak Sason Rofman wrote:
Greetings,

I'm not sure if that's directly related to your problem,
but on a general level, AFAIK, replica-2 vols are not
recommended due to split brain possibility:
https://docs.gluster.org/en/latest/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/

It's recommended to either use replica-3 or arbiter Arbiter.

Regards,

On Tue, Jun 30, 2020 at 1:14 PM lejeczek
<peljasz@xxxxxxxxxxx <mailto:peljasz@xxxxxxxxxxx>> wrote:

     Hi everybody.

     I have two peers in the cluster and a 2-replica volume
     which seems okey if it was not for one weird bit -
     when a peer reboots then on that peer after a reboot I
     see:

     $ gluster volume status USERs
     Status of volume: USERs
     Gluster process                             TCP Port
     RDMA Port  Online  Pid
     ------------------------------------------------------------------------------
     Brick swir.direct:/00.STORAGE/2/0-GLUSTER-U
     SERs                                        N/A
     N/A        N       N/A
     Brick dzien.direct:/00.STORAGE/2/0-GLUSTER-
     USERs                                       49152
     0          Y       57338
     Self-heal Daemon on localhost               N/A
     N/A        Y       4302
     Self-heal Daemon on dzien.direct            N/A
     N/A        Y       57359

     Task Status of Volume USERs
     ------------------------------------------------------------------------------
     There are no active volume tasks

     I do not suppose it's expected.
     On such rebooted node I see:
     $ systemctl status -l glusterd
     ● glusterd.service - GlusterFS, a clustered
     file-system server
        Loaded: loaded
     (/usr/lib/systemd/system/glusterd.service; enabled;
     vendor preset: enabled)
       Drop-In: /etc/systemd/system/glusterd.service.d
                └─override.conf
        Active: active (running) since Mon 2020-06-29
     21:37:36 BST; 13h ago
          Docs: man:glusterd(8)
       Process: 4071 ExecStart=/usr/sbin/glusterd -p
     /var/run/glusterd.pid --log-level $LOG_LEVEL
     $GLUSTERD_OPTIONS (code=exited, status>
      Main PID: 4086 (glusterd)
         Tasks: 20 (limit: 101792)
        Memory: 28.9M
        CGroup: /system.slice/glusterd.service
                ├─4086 /usr/sbin/glusterd -p
     /var/run/glusterd.pid --log-level INFO
                └─4302 /usr/sbin/glusterfs -s localhost
     --volfile-id shd/USERs -p
     /var/run/gluster/shd/USERs/USERs-shd.pid -l /var/log/g>

     Jun 29 21:37:36 swir.private.pawel systemd[1]:
     Starting GlusterFS, a clustered file-system server...
     Jun 29 21:37:36 swir.private.pawel systemd[1]: Started
     GlusterFS, a clustered file-system server.

     And I do not see any other apparent problems nor errors.
     On that node I manually:
     $ systemctl restart glusterd.service
     and...

     $ gluster volume status USERs
     Status of volume: USERs
     Gluster process                             TCP Port
     RDMA Port  Online  Pid
     ------------------------------------------------------------------------------
     Brick swir.direct:/00.STORAGE/2/0-GLUSTER-U
     SERs                                        49152
     0          Y       103225
     Brick dzien.direct:/00.STORAGE/2/0-GLUSTER-
     USERs                                       49152
     0          Y       57338
     Self-heal Daemon on localhost               N/A
     N/A        Y       103270
     Self-heal Daemon on dzien.direct            N/A
     N/A        Y       57359

     Is not a puzzle??? I'm on glusterfs-7.6-1.el8.x86_64
     I hope somebody can share some thoughts.
     many thanks, L.

That cannot be it!? If the root cause of this problem is
2-replica volume then it would be a massive cock-up! Then
2-volume replica should be banned and forbidden.

I hope some can suggest a way to troubleshoot it.

ps. we all, I presume all, know problems of 2-replica volumes.

many thanks, L.

     ________

     Community Meeting Calendar:

     Schedule -
     Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
     Bridge: https://bluejeans.com/441850968

     Gluster-users mailing list
     Gluster-users@xxxxxxxxxxx
     <mailto:Gluster-users@xxxxxxxxxxx>
     https://lists.gluster.org/mailman/listinfo/gluster-users

--
*Barak Sason Rofman*

Gluster Storage Development

Red Hat Israel <https://www.redhat.com/>

34 Jerusalem rd. Ra'anana, 43501

bsasonro@xxxxxxxxxx <mailto:adi@xxxxxxxxxx>
   T: _+972-9-7692304_
M: _+972-52-4326355_

@RedHat <https://twitter.com/redhat>   Red Hat
<https://www.linkedin.com/company/red-hat>  Red Hat
<https://www.facebook.com/redhat.il/>
<https://red.ht/sig>

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users