Re: add-brick: failed: Commit failed

David Cunningham <dcunningham@xxxxxxxxxxxxx> · Mon, 27 May 2019 12:23:59 +1200

Hi Ravi,

Thank you, that seems to have resolved the issue. After doing this, "gluster volume status all" showed gfs3 as online with a port and pid, however "gluster volume status all" didn't show any sync activity happening. At this point we loaded gfs3 with new firewall rules which explicitly allowed access from gfs1 and gfs2, and then "gluster volume status all" showed the file syncing. The gfs3 server should have allow access from gfs1 and gfs2 anyway by default, however I now believe that perhaps this wasn't the case, and maybe it was a firewall issue all along.

Thanks for all your help.

On Sat, 25 May 2019 at 01:49, Ravishankar N <ravishankar@xxxxxxxxxx> wrote:

    Hi David,

    On 23/05/19 3:54 AM, David Cunningham
      wrote:

            Hi Ravi,

            Please see the log attached. 

    When I  grep -E "Connected to |disconnected from"
      gvol0-add-brick-mount.log,  I don't see a "Connected to
    gvol0-client-1". It looks like this temporary mount is not able to
    connect to the 2nd brick, which is why the lookup is failing due to
    lack of quorum.

            The output of "gluster volume status" is as follows.
              Should there be something listening on gfs3? I'm not sure
              whether it having TCP Port and Pid as N/A is a symptom or
              cause. Thank you.

            # gluster volume status

              Status of volume: gvol0

              Gluster process                             TCP Port  RDMA
              Port  Online  Pid

------------------------------------------------------------------------------

              Brick gfs1:/nodirectwritedata/gluster/gvol0 49152    
              0          Y       7706 

              Brick gfs2:/nodirectwritedata/gluster/gvol0 49152    
              0          Y       7624 

              Brick gfs3:/nodirectwritedata/gluster/gvol0 N/A      
              N/A        N       N/A  

    Can you see if the following steps help?

    1. Do a `setfattr -n trusted.afr.gvol0-client-2 -v
        0x000000000000000100000001 /nodirectwritedata/gluster/gvol0`
      on both gfs1 and gfs2.
    2. 'gluster volume start gvol0 force`
    3. Check if Brick-3 now comes online with a valid TCP port and
      PID. If it doesn't, check the brick log under
      /var/log/glusterfs/bricks on gfs3 to see why.
    Thanks,
    Ravi

            Self-heal Daemon on localhost               N/A      
              N/A        Y       19853

              Self-heal Daemon on gfs1                    N/A      
              N/A        Y       28600

              Self-heal Daemon on gfs2                    N/A      
              N/A        Y       17614

              Task Status of Volume gvol0

------------------------------------------------------------------------------

              There are no active volume tasks

        On Wed, 22 May 2019 at 18:06,
          Ravishankar N <ravishankar@xxxxxxxxxx> wrote:

            If you are trying this again, please 'gluster volume set
              $volname client-log-level DEBUG`before attempting the
              add-brick and attach the gvol0-add-brick-mount.log here.
              After that, you can change the client-log-level back to
              INFO.
            -Ravi

            On
              22/05/19 11:32 AM, Ravishankar N wrote:

              On
                22/05/19 11:23 AM, David Cunningham wrote:

                  Hi Ravi,

                  I'd already done exactly that before, where step
                    3 was a simple 'rm -rf
                    /nodirectwritedata/gluster/gvol0'. Have you another
                    suggestion on what the cleanup or reformat should
                    be?

              `rm -rf /nodirectwritedata/gluster/gvol0` does look okay
              to me David. Basically, '/nodirectwritedata/gluster/gvol0'
              must be empty and must not have any extended attributes
              set on it. Why fuse_first_lookup() is failing is a bit of
              a mystery to me at this point. :-(

              Regards,

              Ravi

                  Thank you.

                  On Wed, 22 May 2019
                    at 13:56, Ravishankar N <ravishankar@xxxxxxxxxx>
                    wrote:

                      Hmm, so the volume info seems to indicate that
                        the add-brick was successful but the gfid xattr
                        is missing on the new brick (as are the actual
                        files, barring the .glusterfs folder, according
                        to your previous mail).
                      Do you want to try removing and adding it
                        again?

                      1. `gluster volume remove-brick gvol0 replica 2
                        gfs3:/nodirectwritedata/gluster/gvol0 force`
                        from gfs1

                      2. Check that gluster volume info is now back
                        to a 1x2 volume on all nodes and `gluster peer
                        status` is  connected on all nodes.

                      3. Cleanup or reformat
                        '/nodirectwritedata/gluster/gvol0' on gfs3.

                      4. `gluster volume add-brick gvol0 replica 3
                        arbiter 1 gfs3:/nodirectwritedata/gluster/gvol0`
                        from gfs1.
                      5. Check that the files are getting healed on
                        to the new brick.

                      Thanks,

                      Ravi

                      On
                        22/05/19 6:50 AM, David Cunningham wrote:

                                    Hi Ravi,

                                    Certainly. On the existing two
                                      nodes:

                                    gfs1 # getfattr -d -m. -e hex
                                      /nodirectwritedata/gluster/gvol0

                                      getfattr: Removing leading '/'
                                      from absolute path names

                                      # file:
                                      nodirectwritedata/gluster/gvol0

trusted.afr.dirty=0x000000000000000000000000

trusted.afr.gvol0-client-2=0x000000000000000000000000

trusted.gfid=0x00000000000000000000000000000001

trusted.glusterfs.dht=0x000000010000000000000000ffffffff

trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6

                                    gfs2 # getfattr -d -m. -e hex
                                      /nodirectwritedata/gluster/gvol0

                                      getfattr: Removing leading '/'
                                      from absolute path names

                                      # file:
                                      nodirectwritedata/gluster/gvol0

trusted.afr.dirty=0x000000000000000000000000

trusted.afr.gvol0-client-0=0x000000000000000000000000

trusted.afr.gvol0-client-2=0x000000000000000000000000

trusted.gfid=0x00000000000000000000000000000001

trusted.glusterfs.dht=0x000000010000000000000000ffffffff

trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6

                                    On the new node:

                                    gfs3 # getfattr -d -m. -e hex
                                      /nodirectwritedata/gluster/gvol0

                                      getfattr: Removing leading '/'
                                      from absolute path names

                                      # file:
                                      nodirectwritedata/gluster/gvol0

trusted.afr.dirty=0x000000000000000000000001

trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6

                                    Output of "gluster volume info"
                                      is the same on all 3 nodes and is:

                                    # gluster volume info

                                      Volume Name: gvol0

                                      Type: Replicate

                                      Volume ID:
                                      fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6

                                      Status: Started

                                      Snapshot Count: 0

                                      Number of Bricks: 1 x (2 + 1) = 3

                                      Transport-type: tcp

                                      Bricks:

                                      Brick1:
                                      gfs1:/nodirectwritedata/gluster/gvol0

                                      Brick2:
                                      gfs2:/nodirectwritedata/gluster/gvol0

                                      Brick3:
                                      gfs3:/nodirectwritedata/gluster/gvol0
                                      (arbiter)

                                      Options Reconfigured:

                                      performance.client-io-threads: off

                                      nfs.disable: on

                                      transport.address-family: inet

                          On Wed, 22
                            May 2019 at 12:43, Ravishankar N <ravishankar@xxxxxxxxxx>
                            wrote:

                             Hi David,

                              Could you provide the `getfattr -d -m. -e
                              hex /nodirectwritedata/gluster/gvol0`
                              output of all bricks and the output of
                              `gluster volume info`?

                              Thanks,

                              Ravi

                              On
                                22/05/19 4:57 AM, David Cunningham
                                wrote:

                                            Hi Sanju,

                                            Here's what
                                              glusterd.log says on the
                                              new arbiter server when
                                              trying to add the node:

                                            [2019-05-22
                                              00:15:05.963059] I
                                              [run.c:242:runner_log]
                                              (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd)
                                              [0x7fe4ca9102cd]
                                              -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85)
                                              [0x7fe4ca9bbb85]
                                              -->/lib64/libglusterfs.so.0(runner_log+0x115)
                                              [0x7fe4d5ecc955] )
                                              0-management: Ran script:
/var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
                                              --volname=gvol0
                                              --version=1
                                              --volume-op=add-brick
                                              --gd-workdir=/var/lib/glusterd

                                              [2019-05-22
                                              00:15:05.963177] I [MSGID:
                                              106578]
                                              [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks]
                                              0-management:
                                              replica-count is set 3

                                              [2019-05-22
                                              00:15:05.963228] I [MSGID:
                                              106578]
                                              [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks]
                                              0-management:
                                              arbiter-count is set 1

                                              [2019-05-22
                                              00:15:05.963257] I [MSGID:
                                              106578]
                                              [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks]
                                              0-management: type is set
                                              0, need to change it

                                              [2019-05-22
                                              00:15:17.015268] E [MSGID:
                                              106053]
                                              [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops]
                                              0-management: Failed to
                                              set extended attribute
                                              trusted.add-brick :
                                              Transport endpoint is not
                                              connected [Transport
                                              endpoint is not connected]

                                              [2019-05-22
                                              00:15:17.036479] E [MSGID:
                                              106073]
                                              [glusterd-brick-ops.c:2595:glusterd_op_add_brick]
                                              0-glusterd: Unable to add
                                              bricks

                                              [2019-05-22
                                              00:15:17.036595] E [MSGID:
                                              106122]
                                              [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn]
                                              0-management: Add-brick
                                              commit failed.

                                              [2019-05-22
                                              00:15:17.036710] E [MSGID:
                                              106122]
                                              [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn]
                                              0-management: commit
                                              failed on operation Add
                                              brick

                                            As before
                                              gvol0-add-brick-mount.log
                                              said:

                                            [2019-05-22
                                              00:15:17.005695] I
                                              [fuse-bridge.c:4267:fuse_init]
                                              0-glusterfs-fuse: FUSE
                                              inited with protocol
                                              versions: glusterfs 7.24
                                              kernel 7.22

                                              [2019-05-22
                                              00:15:17.005749] I
                                              [fuse-bridge.c:4878:fuse_graph_sync]
                                              0-fuse: switched to graph
                                              0

                                              [2019-05-22
                                              00:15:17.010101] E
                                              [fuse-bridge.c:4336:fuse_first_lookup]
                                              0-fuse: first lookup on
                                              root failed (Transport
                                              endpoint is not connected)

                                              [2019-05-22
                                              00:15:17.014217] W
                                              [fuse-bridge.c:897:fuse_attr_cbk]
                                              0-glusterfs-fuse: 2:
                                              LOOKUP() / => -1
                                              (Transport endpoint is not
                                              connected)

                                              [2019-05-22
                                              00:15:17.015097] W
                                              [fuse-resolve.c:127:fuse_resolve_gfid_cbk]
                                              0-fuse:
                                              00000000-0000-0000-0000-000000000001:
                                              failed to resolve
                                              (Transport endpoint is not
                                              connected)

                                              [2019-05-22
                                              00:15:17.015158] W
                                              [fuse-bridge.c:3294:fuse_setxattr_resume]
                                              0-glusterfs-fuse: 3:
                                              SETXATTR
                                              00000000-0000-0000-0000-000000000001/1
                                              (trusted.add-brick)
                                              resolution failed

                                              [2019-05-22
                                              00:15:17.035636] I
                                              [fuse-bridge.c:5144:fuse_thread_proc]
                                              0-fuse: initating unmount
                                              of /tmp/mntYGNbj9

                                              [2019-05-22
                                              00:15:17.035854] W
                                              [glusterfsd.c:1500:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5]
                                              -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
                                              [0x55c81b63de75]
                                              -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
                                              [0x55c81b63dceb] ) 0-:
                                              received signum (15),
                                              shutting down

                                              [2019-05-22
                                              00:15:17.035942] I
                                              [fuse-bridge.c:5914:fini]
                                              0-fuse: Unmounting
                                              '/tmp/mntYGNbj9'.

                                              [2019-05-22
                                              00:15:17.035966] I
                                              [fuse-bridge.c:5919:fini]
                                              0-fuse: Closing fuse
                                              connection to
                                              '/tmp/mntYGNbj9'.

                                            Here are the processes
                                              running on the new arbiter
                                              server:
                                            # ps -ef | grep gluster

                                              root      3466     1  0
                                              20:13 ?        00:00:00
                                              /usr/sbin/glusterfs -s
                                              localhost --volfile-id
                                              gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l
                                              /var/log/glusterfs/glustershd.log
                                              -S
                                              /var/run/gluster/24c12b09f93eec8e.socket
                                              --xlator-option
                                              *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
                                              --process-name glustershd

                                              root      6832     1  0
                                              May16 ?        00:02:10
                                              /usr/sbin/glusterd -p
                                              /var/run/glusterd.pid
                                              --log-level INFO

                                              root     17841     1  0
                                              May16 ?        00:00:58
                                              /usr/sbin/glusterfs
                                              --process-name fuse
                                              --volfile-server=gfs1
                                              --volfile-id=/gvol0
                                              /mnt/glusterfs

                                            Here are the files
                                              created on the new arbiter
                                              server:
                                            # find
                                              /nodirectwritedata/gluster/gvol0
                                              | xargs ls -ald

                                              drwxr-xr-x 3 root root
                                              4096 May 21 20:15
                                              /nodirectwritedata/gluster/gvol0

                                              drw------- 2 root root
                                              4096 May 21 20:15
                                              /nodirectwritedata/gluster/gvol0/.glusterfs

                                            Thank you for your
                                              help!

                                  On
                                    Tue, 21 May 2019 at 00:10, Sanju
                                    Rakonde <srakonde@xxxxxxxxxx>
                                    wrote:

                                    David,

                                      can you please attach
                                        glusterd.logs? As the error
                                        message says, Commit failed on
                                        the arbitar node, we might be
                                        able to find some issue on that
                                        node.

                                      On
                                        Mon, May 20, 2019 at 10:10 AM
                                        Nithya Balachandran <nbalacha@xxxxxxxxxx>
                                        wrote:

                                            On Fri,
                                              17 May 2019 at 06:01,
                                              David Cunningham <dcunningham@xxxxxxxxxxxxx>
                                              wrote:

                                                          Hello,

                                                          We're
                                                          adding an
                                                          arbiter node
                                                          to an existing
                                                          volume and
                                                          having an
                                                          issue. Can
                                                          anyone help?
                                                          The root cause
                                                          error appears
                                                          to be
"00000000-0000-0000-0000-000000000001: failed to resolve (Transport
                                                          endpoint is
                                                          not
                                                          connected)",
                                                          as below.

                                                          We are
                                                          running
                                                          glusterfs
                                                          5.6.1. Thanks
                                                          in advance for
                                                          any
                                                          assistance!

                                                          On
                                                          existing node
                                                          gfs1, trying
                                                          to add new
                                                          arbiter node
                                                          gfs3:

                                                          # gluster
                                                          volume
                                                          add-brick
                                                          gvol0 replica
                                                          3 arbiter 1
                                                          gfs3:/nodirectwritedata/gluster/gvol0

                                                          volume
                                                          add-brick:
                                                          failed: Commit
                                                          failed on
                                                          gfs3. Please
                                                          check log file
                                                          for details.

                                            This looks like a
                                              glusterd issue. Please
                                              check the glusterd logs
                                              for more info.
                                            Adding the glusterd dev
                                              to this thread. Sanju, can
                                              you take a look?

                                            Regards,
                                            Nithya

                                                          On new
                                                          node gfs3 in
                                                          gvol0-add-brick-mount.log:

                                                          [2019-05-17
01:20:22.689721] I
[fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE inited with
                                                          protocol
                                                          versions:
                                                          glusterfs 7.24
                                                          kernel 7.22

                                                          [2019-05-17
                                                          01:20:22.689778]
                                                          I
[fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched to graph 0

                                                          [2019-05-17
                                                          01:20:22.694897]
                                                          E
[fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first lookup on root
                                                          failed
                                                          (Transport
                                                          endpoint is
                                                          not connected)

                                                          [2019-05-17
                                                          01:20:22.699770]
                                                          W
[fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
                                                          00000000-0000-0000-0000-000000000001:
                                                          failed to
                                                          resolve
                                                          (Transport
                                                          endpoint is
                                                          not connected)

                                                          [2019-05-17
                                                          01:20:22.699834]
                                                          W
[fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 2: SETXATTR
00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution
                                                          failed

                                                          [2019-05-17
                                                          01:20:22.715656]
                                                          I
[fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: initating unmount of
                                                          /tmp/mntQAtu3f

                                                          [2019-05-17
                                                          01:20:22.715865]
                                                          W
[glusterfsd.c:1500:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5]
                                                          -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
[0x560886581e75]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-:
                                                          received
                                                          signum (15),
                                                          shutting down

                                                          [2019-05-17
                                                          01:20:22.715926]
                                                          I
[fuse-bridge.c:5914:fini] 0-fuse: Unmounting '/tmp/mntQAtu3f'.

                                                          [2019-05-17
                                                          01:20:22.715953]
                                                          I
[fuse-bridge.c:5919:fini] 0-fuse: Closing fuse connection to
                                                          '/tmp/mntQAtu3f'.

                                                          Processes
                                                          running on new
                                                          node gfs3:

                                                          # ps -ef
                                                          | grep gluster

                                                          root     
                                                          6832     1  0
                                                          20:17 ?       
                                                          00:00:00
                                                          /usr/sbin/glusterd
                                                          -p
                                                          /var/run/glusterd.pid
                                                          --log-level
                                                          INFO

                                                          root    
                                                          15799     1  0
                                                          20:17 ?       
                                                          00:00:00
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l
                                                          /var/log/glusterfs/glustershd.log
                                                          -S
                                                          /var/run/gluster/24c12b09f93eec8e.socket
--xlator-option
*replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
                                                          --process-name
                                                          glustershd

                                                          root     16856
                                                          16735  0 21:21
                                                          pts/0   
                                                          00:00:00 grep
                                                          --color=auto
                                                          gluster

                                                          -- 

                                                          David
                                                          Cunningham,
                                                          Voisonics
                                                          Limited

                                                          http://voisonics.com/

                                                          USA: +1 213
                                                          221 1092

                                                          New Zealand:
                                                          +64 (0)28 2558
                                                          3782

_______________________________________________

                                              Gluster-users mailing list

                                              Gluster-users@xxxxxxxxxxx

                                              https://lists.gluster.org/mailman/listinfo/gluster-users

                                    -- 

                                        Thanks,

                                        Sanju

                                -- 

                                                    David
                                                      Cunningham,
                                                      Voisonics Limited

                                                      http://voisonics.com/

                                                      USA: +1 213 221
                                                      1092

                                                      New Zealand: +64
                                                      (0)28 2558 3782

                                _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

                        -- 

                                            David Cunningham,
                                              Voisonics Limited

                                              http://voisonics.com/

                                              USA: +1 213 221 1092

                                              New Zealand: +64 (0)28
                                              2558 3782

                -- 

                                    David Cunningham, Voisonics
                                      Limited

                                      http://voisonics.com/

                                      USA: +1 213 221 1092

                                      New Zealand: +64 (0)28 2558 3782

      -- 

                          David Cunningham, Voisonics Limited

                            http://voisonics.com/

                            USA: +1 213 221 1092

                            New Zealand: +64 (0)28 2558 3782

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users