Re: peer rejected but connected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi, still tricky

whether I do or do not remove "tier-enabled=0" on rejected peer, and try to restart glusterd service there, restart fails:

lusterd version 3.10.5 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) [2017-09-01 07:41:08.251314] I [MSGID: 106478] [glusterd.c:1449:init] 0-management: Maximum allowed open file descriptors set to 65536 [2017-09-01 07:41:08.251400] I [MSGID: 106479] [glusterd.c:1496:init] 0-management: Using /var/lib/glusterd as working directory [2017-09-01 07:41:08.275000] W [MSGID: 103071] [rdma.c:4590:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device] [2017-09-01 07:41:08.275071] W [MSGID: 103055] [rdma.c:4897:init] 0-rdma.management: Failed to initialize IB Device [2017-09-01 07:41:08.275096] W [rpc-transport.c:350:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2017-09-01 07:41:08.275307] W [rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed [2017-09-01 07:41:08.275343] E [MSGID: 106243] [glusterd.c:1720:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [2017-09-01 07:41:13.941020] I [MSGID: 106513] [glusterd-store.c:2197:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 30712 [2017-09-01 07:41:14.109192] I [MSGID: 106498] [glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2017-09-01 07:41:14.109364] W [MSGID: 106062] [glusterd-handler.c:3466:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout [2017-09-01 07:41:14.109481] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2017-09-01 07:41:14.134691] E [MSGID: 106187] [glusterd-store.c:4559:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore [2017-09-01 07:41:14.134769] E [MSGID: 101019] [xlator.c:503:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again [2017-09-01 07:41:14.134790] E [MSGID: 101066] [graph.c:325:glusterfs_graph_init] 0-management: initializing translator failed [2017-09-01 07:41:14.134804] E [MSGID: 101176] [graph.c:681:glusterfs_graph_activate] 0-graph: init failed [2017-09-01 07:41:14.135723] W [glusterfsd.c:1332:cleanup_and_exit] (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) [0x55f22fab3abd] -->/usr/sbin/glusterd(glusterfs_process_volfp+0x1b1) [0x55f22fab3961] -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x55f22fab2e4b] ) 0-: received signum (1), shutting down

I have to wipe clean /var/lib/glusterd on rejected(10.5.6.17) peer and then can restart it, but.. I probe it anew and then "tier-enabled=0" lands in the "info" file for each vol on 10.5.6.17 and... vicious circle?



On 01/09/17 07:30, Gaurav Yadav wrote:
Logs from newly added node helped me in RCA of the issue.

Info file on node 10.5.6.17 consist of an additional property "tier-enabled" which is not present in info file from other 3 nodes, hence when gluster peer probe call is made, in order to maintain consistency across the cluster cksum is compared. In this case as both files are different leading to different cksum, causing state in  "State: Peer Rejected (Connected)".

This inconsistency arise due to upgrade you did.

Workaround:
1.Go to node 10.5.6.17
2.Open info file from "/var/lib/glusterd/vols/<vol-name>/info" and remove "tier-enabled=0".
3.Restart glusterd services
4.Peer probe again.

Thanks
Gaurav

On Thu, Aug 31, 2017 at 3:37 PM, lejeczek <peljasz@xxxxxxxxxxx <mailto:peljasz@xxxxxxxxxxx>> wrote:

    attached the lot as per your request.

    Would bee really great if you can find the root cause
    of this and suggest a resolution. Fingers crossed.
    thanks, L.

    On 31/08/17 05:34, Gaurav Yadav wrote:

        Could you please sendentire content of
        "/var/lib/glusterd/" directory of the 4th node
        which is being peer probed, along with
        command-history and glusterd.logs.

        Thanks
        Gaurav

        On Wed, Aug 30, 2017 at 7:10 PM, lejeczek
        <peljasz@xxxxxxxxxxx <mailto:peljasz@xxxxxxxxxxx>
        <mailto:peljasz@xxxxxxxxxxx
        <mailto:peljasz@xxxxxxxxxxx>>> wrote:



            On 30/08/17 07:18, Gaurav Yadav wrote:


                Could you please send me "info" file which is
                placed in "/var/lib/glusterd/vols/<vol-name>"
                directory from all the nodes along with
                glusterd.logs and command-history.

                Thanks
                Gaurav

                On Tue, Aug 29, 2017 at 7:13 PM, lejeczek
                <peljasz@xxxxxxxxxxx
        <mailto:peljasz@xxxxxxxxxxx>
        <mailto:peljasz@xxxxxxxxxxx
        <mailto:peljasz@xxxxxxxxxxx>>
                <mailto:peljasz@xxxxxxxxxxx
        <mailto:peljasz@xxxxxxxxxxx>

                <mailto:peljasz@xxxxxxxxxxx
        <mailto:peljasz@xxxxxxxxxxx>>>> wrote:

                    hi fellas,
                    same old same
                    in log of the probing peer I see:
                    ...
                    2017-08-29 13:36:16.882196] I [MSGID:
        106493]

               
        [glusterd-handler.c:3020:__glusterd_handle_probe_query]
                    0-glusterd: Responded to
        priv.xx.xx.priv.xx.xx.x,
                    op_ret: 0, op_errno: 0, ret: 0
                    [2017-08-29 13:36:16.904961] I [MSGID:
        106490]

               
        [glusterd-handler.c:2606:__glusterd_handle_incoming_friend_req]
                    0-glusterd: Received probe from uuid:
                    2a17edb4-ae68-4b67-916e-e38a2087ca28
                    [2017-08-29 13:36:16.906477] E [MSGID:
        106010]

               
        [glusterd-utils.c:3034:glusterd_compare_friend_volume]
                    0-management: Version of Cksums CO-DATA
                differ. local
                    cksum = 4088157353, remote cksum =
        2870780063
                on peer
                    10.5.6.17
                    [2017-08-29 13:36:16.907187] I [MSGID:
        106493]

               
        [glusterd-handler.c:3866:glusterd_xfer_friend_add_resp]
                    0-glusterd: Responded to 10.5.6.17
        (0), ret:
                0, op_ret: -1
                    ...

                    Why would adding a new peer make
        cluster jump
                to check
                    checksums on a vol on that newly added
        peer?


            really. I mean, no brick even exists on newly
        added
            peer, it's just been probed, why this?:

            [2017-08-30 13:17:51.949430] E [MSGID: 106010]
           
        [glusterd-utils.c:3034:glusterd_compare_friend_volume]
            0-management: Version of Cksums CO-DATA
        differ. local
            cksum = 4088157353, remote cksum = 2870780063
        on peer
            10.5.6.17

            10.5.6.17 is a candidate I'm probing from a
        working
            cluster.
            Why gluster wants checksums and why checksums
        would be
            different?
            Would anybody know what is going on there?


                    Is it why the peer gets rejected?
                    That peer I'm hoping to add, was a
        member of the
                    cluster in the past but I did "usual"
        wipe of
                    /var/lib/gluster on candidate peer.

                    a hint, solution would be great to hear.
                    L.
                   
        _______________________________________________
                    Gluster-users mailing list
        Gluster-users@xxxxxxxxxxx
        <mailto:Gluster-users@xxxxxxxxxxx>
                <mailto:Gluster-users@xxxxxxxxxxx
        <mailto:Gluster-users@xxxxxxxxxxx>>
                    <mailto:Gluster-users@xxxxxxxxxxx
        <mailto:Gluster-users@xxxxxxxxxxx>
                <mailto:Gluster-users@xxxxxxxxxxx
        <mailto:Gluster-users@xxxxxxxxxxx>>>
        http://lists.gluster.org/mailman/listinfo/gluster-users
        <http://lists.gluster.org/mailman/listinfo/gluster-users>
               
        <http://lists.gluster.org/mailman/listinfo/gluster-users
        <http://lists.gluster.org/mailman/listinfo/gluster-users>>

               
        <http://lists.gluster.org/mailman/listinfo/gluster-users
        <http://lists.gluster.org/mailman/listinfo/gluster-users>
               
        <http://lists.gluster.org/mailman/listinfo/gluster-users
        <http://lists.gluster.org/mailman/listinfo/gluster-users>>>



            _______________________________________________
            Gluster-users mailing list
        Gluster-users@xxxxxxxxxxx
        <mailto:Gluster-users@xxxxxxxxxxx>
            <mailto:Gluster-users@xxxxxxxxxxx
        <mailto:Gluster-users@xxxxxxxxxxx>>
        http://lists.gluster.org/mailman/listinfo/gluster-users
        <http://lists.gluster.org/mailman/listinfo/gluster-users>
           
        <http://lists.gluster.org/mailman/listinfo/gluster-users
        <http://lists.gluster.org/mailman/listinfo/gluster-users>>





_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux