Re: Replica brick not working

Atin Mukherjee <amukherj@xxxxxxxxxx> · Thu, 8 Dec 2016 10:43:22 +0530

From the log snippet:

[2016-12-07 09:15:35.677645] I [MSGID: 106482] [glusterd-brick-ops.c:442:__glusterd_handle_add_brick] 0-management: Received add brick req 
[2016-12-07 09:15:35.677708] I [MSGID: 106062] [glusterd-brick-ops.c:494:__glusterd_handle_add_brick] 0-management: replica-count is 2
[2016-12-07 09:15:35.677735] E [MSGID: 106291] [glusterd-brick-ops.c:614:__glusterd_handle_add_brick] 0-management:

The last log entry indicates that we hit the code path in gd_addbr_validate_replica_count ()

                if (replica_count == volinfo->replica_count) {                                                                                                                                                                                                                  
                        if (!(total_bricks % volinfo->dist_leaf_count)) {       
                                ret = 1;                                        
                                goto out;                                       
                        }                                                       
                }  

@Pranith, Ravi - Milos was trying to convert a dist (1 X 1) volume to a replicate (1 X 2) using add brick and hit this issue where add-brick failed. The cluster is operating with 3.7.6. Could you help on what scenario this code path can be hit? One straight forward issue I see here is missing err_str in this path.

On Wed, Dec 7, 2016 at 7:56 PM, Miloš Čučulović - MDPI <cuculovic@xxxxxxxx> wrote:
Sure Atin, logs are attached.

- Kindest regards,

Milos Cuculovic

IT Manager

---

MDPI AG

Postfach, CH-4020 Basel, Switzerland

Office: St. Alban-Anlage 66, 4052 Basel, Switzerland

Tel. +41 61 683 77 35

Fax +41 61 302 89 18

Email: cuculovic@xxxxxxxx

Skype: milos.cuculovic.mdpi

On 07.12.2016 11:32, Atin Mukherjee wrote:

Milos,

Giving snippets wouldn't help much, could you get me all the log files

(/var/log/glusterfs/*) from both the nodes?

On Wed, Dec 7, 2016 at 3:54 PM, Miloš Čučulović - MDPI

<cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>> wrote:

    Thanks, here is the log after volume force:

    [2016-12-07 10:23:39.157234] I [MSGID: 115036]

    [server.c:552:server_rpc_notify] 0-storage-server: disconnecting

    connection from

    storage2-23175-2016/12/07-10:14:56:951307-storage-client-0-0-0

    [2016-12-07 10:23:39.157301] I [MSGID: 101055]

    [client_t.c:419:gf_client_unref] 0-storage-server: Shutting down

    connection

    storage2-23175-2016/12/07-10:14:56:951307-storage-client-0-0-0

    [2016-12-07 10:23:40.187805] I [login.c:81:gf_auth] 0-auth/login:

    allowed user names: ef4e608d-487b-49a3-85dd-0b36b3554312

    [2016-12-07 10:23:40.187848] I [MSGID: 115029]

    [server-handshake.c:612:server_setvolume] 0-storage-server: accepted

    client from

    storage2-23679-2016/12/07-10:23:40:160327-storage-client-0-0-0

    (version: 3.7.6)

    [2016-12-07 10:23:52.817529] E [MSGID: 113001]

    [posix-helpers.c:1177:posix_handle_pair] 0-storage-posix:

    /data/data-cluster/dms/submissions/User - 226485:

    key:glusterfs.preop.parent.keyflags: 1 length:22 [Operation not

    supported]

    [2016-12-07 10:23:52.817598] E [MSGID: 113001]

    [posix.c:1384:posix_mkdir] 0-storage-posix: setting xattrs on

    /data/data-cluster/dms/submissions/User - 226485 failed [Operation

    not supported]

    [2016-12-07 10:23:52.821388] E [MSGID: 113001]

    [posix-helpers.c:1177:posix_handle_pair] 0-storage-posix:

    /data/data-cluster/dms/submissions/User -

    226485/815a39ccc2cb41dadba45fe7c1e226d4:

    key:glusterfs.preop.parent.keyflags: 1 length:22 [Operation not

    supported]

    [2016-12-07 10:23:52.821434] E [MSGID: 113001]

    [posix.c:1384:posix_mkdir] 0-storage-posix: setting xattrs on

    /data/data-cluster/dms/submissions/User -

    226485/815a39ccc2cb41dadba45fe7c1e226d4 failed [Operation not supported]

    - Kindest regards,

    Milos Cuculovic

    IT Manager

    ---

    MDPI AG

    Postfach, CH-4020 Basel, Switzerland

    Office: St. Alban-Anlage 66, 4052 Basel, Switzerland

    Tel. +41 61 683 77 35

    Fax +41 61 302 89 18

    Email: cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>

    Skype: milos.cuculovic.mdpi

    On 07.12.2016 11:19, Atin Mukherjee wrote:

        You are referring to wrong log file which is for self heal

        daemon. You'd

        need to get back with the brick log file.

        On Wed, Dec 7, 2016 at 3:45 PM, Miloš Čučulović - MDPI

        <cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>

        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>> wrote:

            This is the log file after force command:

            [2016-12-07 10:14:55.945937] W

        [glusterfsd.c:1236:cleanup_and_exit]

            (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x770a)

        [0x7fe9d905570a]

            -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xdd) [0x40810d]

            -->/usr/sbin/glusterfs(cleanup_and_exit+0x4d) [0x407f8d] ) 0-:

            received signum (15), shutting down

            [2016-12-07 10:14:56.960573] I [MSGID: 100030]

            [glusterfsd.c:2318:main] 0-/usr/sbin/glusterfs: Started running

            /usr/sbin/glusterfs version 3.7.6 (args: /usr/sbin/glusterfs -s

            localhost --volfile-id gluster/glustershd -p

            /var/lib/glusterd/glustershd/run/glustershd.pid -l

            /var/log/glusterfs/glustershd.log -S

            /var/run/gluster/2599dc977214c2895ef1b090a26c1518.socket

            --xlator-option

            *replicate*.node-uuid=7c988af2-9f76-4843-8e6f-d94866d57bb0)

            [2016-12-07 10:14:56.968437] I [MSGID: 101190]

            [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started

            thread with index 1

            [2016-12-07 10:14:56.969774] I

        [graph.c:269:gf_add_cmdline_options]

            0-storage-replicate-0: adding option 'node-uuid' for volume

            'storage-replicate-0' with value

        '7c988af2-9f76-4843-8e6f-d94866d57bb0'

            [2016-12-07 10:14:56.985257] I [MSGID: 101190]

            [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started

            thread with index 2

            [2016-12-07 10:14:56.986105] I [MSGID: 114020]

            [client.c:2118:notify] 0-storage-client-0: parent

        translators are

            ready, attempting connect on transport

            [2016-12-07 10:14:56.986668] I [MSGID: 114020]

            [client.c:2118:notify] 0-storage-client-1: parent

        translators are

            ready, attempting connect on transport

            Final graph:

        +------------------------------------------------------------------------------+

              1: volume storage-client-0

              2:     type protocol/client

              3:     option ping-timeout 42

              4:     option remote-host storage2

              5:     option remote-subvolume /data/data-cluster

              6:     option transport-type socket

              7:     option username ef4e608d-487b-49a3-85dd-0b36b3554312

              8:     option password dda0bdbf-95c1-4206-a57d-686756210170

              9: end-volume

             10:

             11: volume storage-client-1

             12:     type protocol/client

             13:     option ping-timeout 42

             14:     option remote-host storage

             15:     option remote-subvolume /data/data-cluster

             16:     option transport-type socket

             17:     option username ef4e608d-487b-49a3-85dd-0b36b3554312

             18:     option password dda0bdbf-95c1-4206-a57d-686756210170

             19: end-volume

             20:

             21: volume storage-replicate-0

             22:     type cluster/replicate

             23:     option node-uuid 7c988af2-9f76-4843-8e6f-d94866d57bb0

             24:     option background-self-heal-count 0

             25:     option metadata-self-heal on

             26:     option data-self-heal on

             27:     option entry-self-heal on

             28:     option self-heal-daemon enable

             29:     option iam-self-heal-daemon yes

            [2016-12-07 10:14:56.987096] I

        [rpc-clnt.c:1847:rpc_clnt_reconfig]

            0-storage-client-0: changing port to 49152 (from 0)

             30:     subvolumes storage-client-0 storage-client-1

             31: end-volume

             32:

             33: volume glustershd

             34:     type debug/io-stats

             35:     subvolumes storage-replicate-0

             36: end-volume

             37:

        +------------------------------------------------------------------------------+

            [2016-12-07 10:14:56.987685] E [MSGID: 114058]

            [client-handshake.c:1524:client_query_portmap_cbk]

            0-storage-client-1: failed to get the port number for remote

            subvolume. Please run 'gluster volume status' on server to

        see if

            brick process is running.

            [2016-12-07 10:14:56.987766] I [MSGID: 114018]

            [client.c:2042:client_rpc_notify] 0-storage-client-1:

        disconnected

            from storage-client-1. Client process will keep trying to

        connect to

            glusterd until brick's port is available

            [2016-12-07 10:14:56.988065] I [MSGID: 114057]

            [client-handshake.c:1437:select_server_supported_programs]

            0-storage-client-0: Using Program GlusterFS 3.3, Num (1298437),

            Version (330)

            [2016-12-07 10:14:56.988387] I [MSGID: 114046]

            [client-handshake.c:1213:client_setvolume_cbk]

        0-storage-client-0:

            Connected to storage-client-0, attached to remote volume

            '/data/data-cluster'.

            [2016-12-07 10:14:56.988409] I [MSGID: 114047]

            [client-handshake.c:1224:client_setvolume_cbk]

        0-storage-client-0:

            Server and Client lk-version numbers are not same, reopening

        the fds

            [2016-12-07 10:14:56.988476] I [MSGID: 108005]

            [afr-common.c:3841:afr_notify] 0-storage-replicate-0: Subvolume

            'storage-client-0' came back up; going online.

            [2016-12-07 10:14:56.988581] I [MSGID: 114035]

            [client-handshake.c:193:client_set_lk_version_cbk]

            0-storage-client-0: Server lk version = 1

            - Kindest regards,

            Milos Cuculovic

            IT Manager

            ---

            MDPI AG

            Postfach, CH-4020 Basel, Switzerland

            Office: St. Alban-Anlage 66, 4052 Basel, Switzerland

            Tel. +41 61 683 77 35

            Fax +41 61 302 89 18

            Email: cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>

        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>

            Skype: milos.cuculovic.mdpi

            On 07.12.2016 11:09, Atin Mukherjee wrote:

                On Wed, Dec 7, 2016 at 3:37 PM, Miloš Čučulović - MDPI

                <cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>

        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>

                <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>

        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>>> wrote:

                    Hi Akin,

                    thanks for your reply.

                    I was trying to debug it since yesterday and today I

        completely

                    purget the glusterfs-server from the storage server.

                    I installed it again, checked the firewall and the

        current

                status is

                    as follows now:

                    On storage2, I am running:

                    sudo gluster volume add-brick storage replica 2

                    storage:/data/data-cluster

                    Answer => volume add-brick: failed: Operation failed

                    cmd_history says:

                    [2016-12-07 09:57:28.471009]  : volume add-brick storage

                replica 2

                    storage:/data/data-cluster : FAILED : Operation failed

                    glustershd.log => no new entry on runing the

        add-brick command.

                    etc-glusterfs-glusterd.vol.log =>

                    [2016-12-07 10:01:56.567564] I [MSGID: 106482]

                    [glusterd-brick-ops.c:442:__glusterd_handle_add_brick]

                0-management:

                    Received add brick req

                    [2016-12-07 10:01:56.567626] I [MSGID: 106062]

                    [glusterd-brick-ops.c:494:__glusterd_handle_add_brick]

                0-management:

                    replica-count is 2

                    [2016-12-07 10:01:56.567655] E [MSGID: 106291]

                    [glusterd-brick-ops.c:614:__glusterd_handle_add_brick]

                0-management:

                    Logs from storage (new server), there is no relevant log

                when I am

                    running the command add-brick on storage2.

                    Now, after reinstalling glusterfs-server on storage,

        I can

                see on

                    storage2:

                    Status of volume: storage

                    Gluster process                       TCP Port  RDMA

        Port

                Online  Pid

        ------------------------------------------------------------------------------

                    Brick storage2:/data/data-cluster    49152     0

              Y

                     2160

                    Self-heal Daemon on localhost        N/A       N/A

              Y

                     7906

                    Task Status of Volume storage

        ------------------------------------------------------------------------------

                    There are no active volume tasks

                    By running the "gluster volume start storage force",

        do I

                risk to

                    broke the storage2? This is a production server and

        needs to

                stay live.

                No, its going to bring up the brick process(es) if its

        not up.

                    - Kindest regards,

                    Milos Cuculovic

                    IT Manager

                    ---

                    MDPI AG

                    Postfach, CH-4020 Basel, Switzerland

                    Office: St. Alban-Anlage 66, 4052 Basel, Switzerland

                    Tel. +41 61 683 77 35

                    Fax +41 61 302 89 18

                    Email: cuculovic@xxxxxxxx

        <mailto:cuculovic@xxxxxxxx> <mailto:cuculovic@xxxxxxxx

        <mailto:cuculovic@xxxxxxxx>>

                <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>

        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>>

                    Skype: milos.cuculovic.mdpi

                    On 07.12.2016 10:44, Atin Mukherjee wrote:

                        On Tue, Dec 6, 2016 at 10:08 PM, Miloš Čučulović

        - MDPI

                        <cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>

        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>

                <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>

        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>>

                        <mailto:cuculovic@xxxxxxxx

        <mailto:cuculovic@xxxxxxxx> <mailto:cuculovic@xxxxxxxx

        <mailto:cuculovic@xxxxxxxx>>

                <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>

        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>>>> wrote:

                            Dear All,

                            I have two servers, storage and storage2.

                            The storage2 had a volume called storage.

                            I then decided to add a replica brick (storage).

                            I did this in the following way:

                            1. sudo gluster peer probe storage (on

        storage server2)

                            2. sudo gluster volume add-brick storage

        replica 2

                            storage:/data/data-cluster

                            Then I was getting the following error:

                            volume add-brick: failed: Operation failed

                            But, it seems the brick was somehow added,

        as when

                checking

                        on storage2:

                            sudo gluster volume info storage

                            I am getting:

                            Status: Started

                            Number of Bricks: 1 x 2 = 2

                            Transport-type: tcp

                            Bricks:

                            Brick1: storage2:/data/data-cluster

                            Brick2: storage:/data/data-cluster

                            So, seems ok here, however, when doing:

                            sudo gluster volume heal storage info

                            I am getting:

                            Volume storage is not of type replicate/disperse

                            Volume heal failed.

                            Also, when doing

                            sudo gluster volume status all

                            I am getting:

                            Status of volume: storage

                            Gluster process                       TCP

        Port  RDMA

                Port

                        Online  Pid

        ------------------------------------------------------------------------------

                            Brick storage2:/data/data-cluster    49152     0

                      Y

                             2160

                            Brick storage:/data/data-cluster     N/A

           N/A

                      N

                             N/A

                            Self-heal Daemon on localhost        N/A

           N/A

                      Y

                             7906

                            Self-heal Daemon on storage          N/A

           N/A

                      N

                             N/A

                            Task Status of Volume storage

        ------------------------------------------------------------------------------

                            Any idea please?

                        It looks like the brick didn't come up during an

        add-brick.

                        Could you

                        share cmd_history, glusterd and the new brick

        log file

                from both the

                        nodes? As a workaround, could you try 'gluster

        volume

                start storage

                        force' and see if the issue persists?

                            --

                            - Kindest regards,

                            Milos Cuculovic

                            IT Manager

                            ---

                            MDPI AG

                            Postfach, CH-4020 Basel, Switzerland

                            Office: St. Alban-Anlage 66, 4052 Basel,

        Switzerland

                            Tel. +41 61 683 77 35

                            Fax +41 61 302 89 18

                            Email: cuculovic@xxxxxxxx

        <mailto:cuculovic@xxxxxxxx>

                <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>

        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>

                <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>>

                        <mailto:cuculovic@xxxxxxxx

        <mailto:cuculovic@xxxxxxxx> <mailto:cuculovic@xxxxxxxx

        <mailto:cuculovic@xxxxxxxx>>

                <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>

        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>>>

                            Skype: milos.cuculovic.mdpi

                            _______________________________________________

                            Gluster-users mailing list

                            Gluster-users@xxxxxxxxxxx

        <mailto:Gluster-users@gluster.org>

                <mailto:Gluster-users@gluster.org

        <mailto:Gluster-users@gluster.org>>

                <mailto:Gluster-users@gluster.org

        <mailto:Gluster-users@gluster.org>

                <mailto:Gluster-users@gluster.org

        <mailto:Gluster-users@gluster.org>>>

                        <mailto:Gluster-users@gluster.org

        <mailto:Gluster-users@gluster.org>

                <mailto:Gluster-users@gluster.org

        <mailto:Gluster-users@gluster.org>>

                        <mailto:Gluster-users@gluster.org

        <mailto:Gluster-users@gluster.org>

                <mailto:Gluster-users@gluster.org

        <mailto:Gluster-users@gluster.org>>>>

                http://www.gluster.org/mailman/listinfo/gluster-users

        <http://www.gluster.org/mailman/listinfo/gluster-users>

                <http://www.gluster.org/mailman/listinfo/gluster-users

        <http://www.gluster.org/mailman/listinfo/gluster-users>>

        <http://www.gluster.org/mailman/listinfo/gluster-users

        <http://www.gluster.org/mailman/listinfo/gluster-users>

                <http://www.gluster.org/mailman/listinfo/gluster-users

        <http://www.gluster.org/mailman/listinfo/gluster-users>>>

                <http://www.gluster.org/mailman/listinfo/gluster-users

        <http://www.gluster.org/mailman/listinfo/gluster-users>

                <http://www.gluster.org/mailman/listinfo/gluster-users

        <http://www.gluster.org/mailman/listinfo/gluster-users>>

        <http://www.gluster.org/mailman/listinfo/gluster-users

        <http://www.gluster.org/mailman/listinfo/gluster-users>

                <http://www.gluster.org/mailman/listinfo/gluster-users

        <http://www.gluster.org/mailman/listinfo/gluster-users>>>>

                        --

                        ~ Atin (atinm)

                --

                ~ Atin (atinm)

        --

        ~ Atin (atinm)

--

~ Atin (atinm)

-- 

~ Atin (atinm)

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users