Re: Replica brick not working

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From the log snippet:

[2016-12-07 09:15:35.677645] I [MSGID: 106482] [glusterd-brick-ops.c:442:__glusterd_handle_add_brick] 0-management: Received add brick req
[2016-12-07 09:15:35.677708] I [MSGID: 106062] [glusterd-brick-ops.c:494:__glusterd_handle_add_brick] 0-management: replica-count is 2
[2016-12-07 09:15:35.677735] E [MSGID: 106291] [glusterd-brick-ops.c:614:__glusterd_handle_add_brick] 0-management:

The last log entry indicates that we hit the code path in gd_addbr_validate_replica_count ()

                if (replica_count == volinfo->replica_count) {                                                                                                                                                                                                                 
                        if (!(total_bricks % volinfo->dist_leaf_count)) {      
                                ret = 1;                                       
                                goto out;                                      
                        }                                                      
                } 

@Pranith, Ravi - Milos was trying to convert a dist (1 X 1) volume to a replicate (1 X 2) using add brick and hit this issue where add-brick failed. The cluster is operating with 3.7.6. Could you help on what scenario this code path can be hit? One straight forward issue I see here is missing err_str in this path.



On Wed, Dec 7, 2016 at 7:56 PM, Miloš Čučulović - MDPI <cuculovic@xxxxxxxx> wrote:
Sure Atin, logs are attached.

- Kindest regards,

Milos Cuculovic
IT Manager

---
MDPI AG
Postfach, CH-4020 Basel, Switzerland
Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
Tel. +41 61 683 77 35
Fax +41 61 302 89 18
Email: cuculovic@xxxxxxxx
Skype: milos.cuculovic.mdpi

On 07.12.2016 11:32, Atin Mukherjee wrote:
Milos,

Giving snippets wouldn't help much, could you get me all the log files
(/var/log/glusterfs/*) from both the nodes?

On Wed, Dec 7, 2016 at 3:54 PM, Miloš Čučulović - MDPI
<cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>> wrote:

    Thanks, here is the log after volume force:

    [2016-12-07 10:23:39.157234] I [MSGID: 115036]
    [server.c:552:server_rpc_notify] 0-storage-server: disconnecting
    connection from
    storage2-23175-2016/12/07-10:14:56:951307-storage-client-0-0-0
    [2016-12-07 10:23:39.157301] I [MSGID: 101055]
    [client_t.c:419:gf_client_unref] 0-storage-server: Shutting down
    connection
    storage2-23175-2016/12/07-10:14:56:951307-storage-client-0-0-0
    [2016-12-07 10:23:40.187805] I [login.c:81:gf_auth] 0-auth/login:
    allowed user names: ef4e608d-487b-49a3-85dd-0b36b3554312
    [2016-12-07 10:23:40.187848] I [MSGID: 115029]
    [server-handshake.c:612:server_setvolume] 0-storage-server: accepted
    client from
    storage2-23679-2016/12/07-10:23:40:160327-storage-client-0-0-0
    (version: 3.7.6)
    [2016-12-07 10:23:52.817529] E [MSGID: 113001]
    [posix-helpers.c:1177:posix_handle_pair] 0-storage-posix:
    /data/data-cluster/dms/submissions/User - 226485:
    key:glusterfs.preop.parent.keyflags: 1 length:22 [Operation not
    supported]
    [2016-12-07 10:23:52.817598] E [MSGID: 113001]
    [posix.c:1384:posix_mkdir] 0-storage-posix: setting xattrs on
    /data/data-cluster/dms/submissions/User - 226485 failed [Operation
    not supported]
    [2016-12-07 10:23:52.821388] E [MSGID: 113001]
    [posix-helpers.c:1177:posix_handle_pair] 0-storage-posix:
    /data/data-cluster/dms/submissions/User -
    226485/815a39ccc2cb41dadba45fe7c1e226d4:
    key:glusterfs.preop.parent.keyflags: 1 length:22 [Operation not
    supported]
    [2016-12-07 10:23:52.821434] E [MSGID: 113001]
    [posix.c:1384:posix_mkdir] 0-storage-posix: setting xattrs on
    /data/data-cluster/dms/submissions/User -
    226485/815a39ccc2cb41dadba45fe7c1e226d4 failed [Operation not supported]

    - Kindest regards,

    Milos Cuculovic
    IT Manager

    ---
    MDPI AG
    Postfach, CH-4020 Basel, Switzerland
    Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
    Tel. +41 61 683 77 35
    Fax +41 61 302 89 18
    Email: cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>
    Skype: milos.cuculovic.mdpi

    On 07.12.2016 11:19, Atin Mukherjee wrote:

        You are referring to wrong log file which is for self heal
        daemon. You'd
        need to get back with the brick log file.

        On Wed, Dec 7, 2016 at 3:45 PM, Miloš Čučulović - MDPI
        <cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>
        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>> wrote:

            This is the log file after force command:


            [2016-12-07 10:14:55.945937] W
        [glusterfsd.c:1236:cleanup_and_exit]
            (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x770a)
        [0x7fe9d905570a]
            -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xdd) [0x40810d]
            -->/usr/sbin/glusterfs(cleanup_and_exit+0x4d) [0x407f8d] ) 0-:
            received signum (15), shutting down
            [2016-12-07 10:14:56.960573] I [MSGID: 100030]
            [glusterfsd.c:2318:main] 0-/usr/sbin/glusterfs: Started running
            /usr/sbin/glusterfs version 3.7.6 (args: /usr/sbin/glusterfs -s
            localhost --volfile-id gluster/glustershd -p
            /var/lib/glusterd/glustershd/run/glustershd.pid -l
            /var/log/glusterfs/glustershd.log -S
            /var/run/gluster/2599dc977214c2895ef1b090a26c1518.socket
            --xlator-option
            *replicate*.node-uuid=7c988af2-9f76-4843-8e6f-d94866d57bb0)
            [2016-12-07 10:14:56.968437] I [MSGID: 101190]
            [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
            thread with index 1
            [2016-12-07 10:14:56.969774] I
        [graph.c:269:gf_add_cmdline_options]
            0-storage-replicate-0: adding option 'node-uuid' for volume
            'storage-replicate-0' with value
        '7c988af2-9f76-4843-8e6f-d94866d57bb0'
            [2016-12-07 10:14:56.985257] I [MSGID: 101190]
            [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
            thread with index 2
            [2016-12-07 10:14:56.986105] I [MSGID: 114020]
            [client.c:2118:notify] 0-storage-client-0: parent
        translators are
            ready, attempting connect on transport
            [2016-12-07 10:14:56.986668] I [MSGID: 114020]
            [client.c:2118:notify] 0-storage-client-1: parent
        translators are
            ready, attempting connect on transport
            Final graph:

        +------------------------------------------------------------------------------+
              1: volume storage-client-0
              2:     type protocol/client
              3:     option ping-timeout 42
              4:     option remote-host storage2
              5:     option remote-subvolume /data/data-cluster
              6:     option transport-type socket
              7:     option username ef4e608d-487b-49a3-85dd-0b36b3554312
              8:     option password dda0bdbf-95c1-4206-a57d-686756210170
              9: end-volume
             10:
             11: volume storage-client-1
             12:     type protocol/client
             13:     option ping-timeout 42
             14:     option remote-host storage
             15:     option remote-subvolume /data/data-cluster
             16:     option transport-type socket
             17:     option username ef4e608d-487b-49a3-85dd-0b36b3554312
             18:     option password dda0bdbf-95c1-4206-a57d-686756210170
             19: end-volume
             20:
             21: volume storage-replicate-0
             22:     type cluster/replicate
             23:     option node-uuid 7c988af2-9f76-4843-8e6f-d94866d57bb0
             24:     option background-self-heal-count 0
             25:     option metadata-self-heal on
             26:     option data-self-heal on
             27:     option entry-self-heal on
             28:     option self-heal-daemon enable
             29:     option iam-self-heal-daemon yes
            [2016-12-07 10:14:56.987096] I
        [rpc-clnt.c:1847:rpc_clnt_reconfig]
            0-storage-client-0: changing port to 49152 (from 0)
             30:     subvolumes storage-client-0 storage-client-1
             31: end-volume
             32:
             33: volume glustershd
             34:     type debug/io-stats
             35:     subvolumes storage-replicate-0
             36: end-volume
             37:

        +------------------------------------------------------------------------------+
            [2016-12-07 10:14:56.987685] E [MSGID: 114058]
            [client-handshake.c:1524:client_query_portmap_cbk]
            0-storage-client-1: failed to get the port number for remote
            subvolume. Please run 'gluster volume status' on server to
        see if
            brick process is running.
            [2016-12-07 10:14:56.987766] I [MSGID: 114018]
            [client.c:2042:client_rpc_notify] 0-storage-client-1:
        disconnected
            from storage-client-1. Client process will keep trying to
        connect to
            glusterd until brick's port is available
            [2016-12-07 10:14:56.988065] I [MSGID: 114057]
            [client-handshake.c:1437:select_server_supported_programs]
            0-storage-client-0: Using Program GlusterFS 3.3, Num (1298437),
            Version (330)
            [2016-12-07 10:14:56.988387] I [MSGID: 114046]
            [client-handshake.c:1213:client_setvolume_cbk]
        0-storage-client-0:
            Connected to storage-client-0, attached to remote volume
            '/data/data-cluster'.
            [2016-12-07 10:14:56.988409] I [MSGID: 114047]
            [client-handshake.c:1224:client_setvolume_cbk]
        0-storage-client-0:
            Server and Client lk-version numbers are not same, reopening
        the fds
            [2016-12-07 10:14:56.988476] I [MSGID: 108005]
            [afr-common.c:3841:afr_notify] 0-storage-replicate-0: Subvolume
            'storage-client-0' came back up; going online.
            [2016-12-07 10:14:56.988581] I [MSGID: 114035]
            [client-handshake.c:193:client_set_lk_version_cbk]
            0-storage-client-0: Server lk version = 1


            - Kindest regards,

            Milos Cuculovic
            IT Manager

            ---
            MDPI AG
            Postfach, CH-4020 Basel, Switzerland
            Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
            Tel. +41 61 683 77 35
            Fax +41 61 302 89 18
            Email: cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>
        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>
            Skype: milos.cuculovic.mdpi

            On 07.12.2016 11:09, Atin Mukherjee wrote:



                On Wed, Dec 7, 2016 at 3:37 PM, Miloš Čučulović - MDPI
                <cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>
        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>
                <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>
        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>>> wrote:

                    Hi Akin,

                    thanks for your reply.

                    I was trying to debug it since yesterday and today I
        completely
                    purget the glusterfs-server from the storage server.

                    I installed it again, checked the firewall and the
        current
                status is
                    as follows now:

                    On storage2, I am running:
                    sudo gluster volume add-brick storage replica 2
                    storage:/data/data-cluster
                    Answer => volume add-brick: failed: Operation failed
                    cmd_history says:
                    [2016-12-07 09:57:28.471009]  : volume add-brick storage
                replica 2
                    storage:/data/data-cluster : FAILED : Operation failed

                    glustershd.log => no new entry on runing the
        add-brick command.

                    etc-glusterfs-glusterd.vol.log =>
                    [2016-12-07 10:01:56.567564] I [MSGID: 106482]
                    [glusterd-brick-ops.c:442:__glusterd_handle_add_brick]
                0-management:
                    Received add brick req
                    [2016-12-07 10:01:56.567626] I [MSGID: 106062]
                    [glusterd-brick-ops.c:494:__glusterd_handle_add_brick]
                0-management:
                    replica-count is 2
                    [2016-12-07 10:01:56.567655] E [MSGID: 106291]
                    [glusterd-brick-ops.c:614:__glusterd_handle_add_brick]
                0-management:


                    Logs from storage (new server), there is no relevant log
                when I am
                    running the command add-brick on storage2.


                    Now, after reinstalling glusterfs-server on storage,
        I can
                see on
                    storage2:

                    Status of volume: storage
                    Gluster process                       TCP Port  RDMA
        Port
                Online  Pid


        ------------------------------------------------------------------------------
                    Brick storage2:/data/data-cluster    49152     0
              Y
                     2160
                    Self-heal Daemon on localhost        N/A       N/A
              Y
                     7906

                    Task Status of Volume storage


        ------------------------------------------------------------------------------
                    There are no active volume tasks


                    By running the "gluster volume start storage force",
        do I
                risk to
                    broke the storage2? This is a production server and
        needs to
                stay live.


                No, its going to bring up the brick process(es) if its
        not up.


                    - Kindest regards,

                    Milos Cuculovic
                    IT Manager

                    ---
                    MDPI AG
                    Postfach, CH-4020 Basel, Switzerland
                    Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
                    Tel. +41 61 683 77 35
                    Fax +41 61 302 89 18
                    Email: cuculovic@xxxxxxxx
        <mailto:cuculovic@xxxxxxxx> <mailto:cuculovic@xxxxxxxx
        <mailto:cuculovic@xxxxxxxx>>
                <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>
        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>>
                    Skype: milos.cuculovic.mdpi

                    On 07.12.2016 10:44, Atin Mukherjee wrote:



                        On Tue, Dec 6, 2016 at 10:08 PM, Miloš Čučulović
        - MDPI
                        <cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>
        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>
                <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>
        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>>
                        <mailto:cuculovic@xxxxxxxx
        <mailto:cuculovic@xxxxxxxx> <mailto:cuculovic@xxxxxxxx
        <mailto:cuculovic@xxxxxxxx>>
                <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>
        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>>>> wrote:

                            Dear All,

                            I have two servers, storage and storage2.
                            The storage2 had a volume called storage.
                            I then decided to add a replica brick (storage).

                            I did this in the following way:

                            1. sudo gluster peer probe storage (on
        storage server2)
                            2. sudo gluster volume add-brick storage
        replica 2
                            storage:/data/data-cluster

                            Then I was getting the following error:
                            volume add-brick: failed: Operation failed

                            But, it seems the brick was somehow added,
        as when
                checking
                        on storage2:
                            sudo gluster volume info storage
                            I am getting:
                            Status: Started
                            Number of Bricks: 1 x 2 = 2
                            Transport-type: tcp
                            Bricks:
                            Brick1: storage2:/data/data-cluster
                            Brick2: storage:/data/data-cluster


                            So, seems ok here, however, when doing:
                            sudo gluster volume heal storage info
                            I am getting:
                            Volume storage is not of type replicate/disperse
                            Volume heal failed.


                            Also, when doing
                            sudo gluster volume status all

                            I am getting:
                            Status of volume: storage
                            Gluster process                       TCP
        Port  RDMA
                Port
                        Online  Pid



        ------------------------------------------------------------------------------
                            Brick storage2:/data/data-cluster    49152     0
                      Y
                             2160
                            Brick storage:/data/data-cluster     N/A
           N/A
                      N
                             N/A
                            Self-heal Daemon on localhost        N/A
           N/A
                      Y
                             7906
                            Self-heal Daemon on storage          N/A
           N/A
                      N
                             N/A

                            Task Status of Volume storage



        ------------------------------------------------------------------------------

                            Any idea please?


                        It looks like the brick didn't come up during an
        add-brick.
                        Could you
                        share cmd_history, glusterd and the new brick
        log file
                from both the
                        nodes? As a workaround, could you try 'gluster
        volume
                start storage
                        force' and see if the issue persists?



                            --
                            - Kindest regards,

                            Milos Cuculovic
                            IT Manager

                            ---
                            MDPI AG
                            Postfach, CH-4020 Basel, Switzerland
                            Office: St. Alban-Anlage 66, 4052 Basel,
        Switzerland
                            Tel. +41 61 683 77 35
                            Fax +41 61 302 89 18
                            Email: cuculovic@xxxxxxxx
        <mailto:cuculovic@xxxxxxxx>
                <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>
        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>
                <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>>
                        <mailto:cuculovic@xxxxxxxx
        <mailto:cuculovic@xxxxxxxx> <mailto:cuculovic@xxxxxxxx
        <mailto:cuculovic@xxxxxxxx>>
                <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>
        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>>>
                            Skype: milos.cuculovic.mdpi
                            _______________________________________________
                            Gluster-users mailing list
                            Gluster-users@xxxxxxxxxxx
        <mailto:Gluster-users@gluster.org>
                <mailto:Gluster-users@gluster.org
        <mailto:Gluster-users@gluster.org>>
                <mailto:Gluster-users@gluster.org
        <mailto:Gluster-users@gluster.org>
                <mailto:Gluster-users@gluster.org
        <mailto:Gluster-users@gluster.org>>>
                        <mailto:Gluster-users@gluster.org
        <mailto:Gluster-users@gluster.org>
                <mailto:Gluster-users@gluster.org
        <mailto:Gluster-users@gluster.org>>
                        <mailto:Gluster-users@gluster.org
        <mailto:Gluster-users@gluster.org>
                <mailto:Gluster-users@gluster.org
        <mailto:Gluster-users@gluster.org>>>>

                http://www.gluster.org/mailman/listinfo/gluster-users
        <http://www.gluster.org/mailman/listinfo/gluster-users>
                <http://www.gluster.org/mailman/listinfo/gluster-users
        <http://www.gluster.org/mailman/listinfo/gluster-users>>

        <http://www.gluster.org/mailman/listinfo/gluster-users
        <http://www.gluster.org/mailman/listinfo/gluster-users>
                <http://www.gluster.org/mailman/listinfo/gluster-users
        <http://www.gluster.org/mailman/listinfo/gluster-users>>>

                <http://www.gluster.org/mailman/listinfo/gluster-users
        <http://www.gluster.org/mailman/listinfo/gluster-users>
                <http://www.gluster.org/mailman/listinfo/gluster-users
        <http://www.gluster.org/mailman/listinfo/gluster-users>>

        <http://www.gluster.org/mailman/listinfo/gluster-users
        <http://www.gluster.org/mailman/listinfo/gluster-users>
                <http://www.gluster.org/mailman/listinfo/gluster-users
        <http://www.gluster.org/mailman/listinfo/gluster-users>>>>




                        --

                        ~ Atin (atinm)




                --

                ~ Atin (atinm)




        --

        ~ Atin (atinm)




--

~ Atin (atinm)



--

~ Atin (atinm)
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux