Re: Fwd: Replica brick not working

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Thu, Dec 8, 2016 at 11:17 PM, Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote:


On Thu, Dec 8, 2016 at 10:22 PM, Ravishankar N <ravishankar@xxxxxxxxxx> wrote:
On 12/08/2016 09:44 PM, Miloš Čučulović - MDPI wrote:
I was able to fix the sync by rsync-ing all the directories, then the hale started. The next problem :), as soon as there are files on the new brick, the gluster mount will render also this one for mounts, and the new brick is not ready yet, as the sync is not yet done, so it results on missing files on client side. I temporary removed the new brick, now I am running a manual rsync and will add the brick again, hope this could work.

What mechanism is managing this issue, I guess there is something per built to make a replica brick available only once the data is completely synced.
This mechanism was introduced in  3.7.9 or 3.7.10 (http://review.gluster.org/#/c/13806/). Before that version, you manually needed to set some xattrs on the bricks so that healing could happen in parallel while the client still would server reads from the original brick.  I can't find the link to the doc which describes these steps for setting xattrs.:-(

Oh this is addition of bricks?
Just do the following:
1) Bring the new brick down by killing it.
2) On the root of the mount directory(Let's call it /mnt) do:

mkdir /mnt/<name-of-nonexistent-dir>
rmdir /mnt/<name-of-nonexistent-dir>
setfattr -n trusted.non-existent-key -v abc /mnt
setfattr -x trusted.non-existent-key  /mnt

3) Start the volume using: "gluster volume start <volname> force"

This will trigger the heal which will make sure everything is healed and the application will only see the correct data.

Since you did an explicit rsync, there is no gurantee that things should work as expected. We will be adding the steps above to documentation.




Calling it a day,
Ravi


- Kindest regards,

Milos Cuculovic
IT Manager

---
MDPI AG
Postfach, CH-4020 Basel, Switzerland
Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
Tel. +41 61 683 77 35
Fax +41 61 302 89 18
Email: cuculovic@xxxxxxxx
Skype: milos.cuculovic.mdpi

On 08.12.2016 16:17, Ravishankar N wrote:
On 12/08/2016 06:53 PM, Atin Mukherjee wrote:


On Thu, Dec 8, 2016 at 6:44 PM, Miloš Čučulović - MDPI
<cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>> wrote:

    Ah, damn! I found the issue. On the storage server, the storage2
    IP address was wrong, I inversed two digits in the /etc/hosts
    file, sorry for that :(

    I was able to add the brick now, I started the heal, but still no
    data transfer visible.

1. Are the files getting created on the new brick though?
2. Can you provide the output of `getfattr -d -m . -e hex
/data/data-cluster` on both bricks?
3. Is it possible to attach gdb to the self-heal daemon on the original
(old) brick and get a backtrace?
    `gdb -p <pid of self-heal daemon on the orignal brick>`
     thread apply all bt  -->share this output
    quit gdb.


-Ravi

@Ravi/Pranith - can you help here?



    By doing gluster volume status, I have

    Status of volume: storage
    Gluster process                       TCP Port  RDMA Port Online  Pid
------------------------------------------------------------------------------
    Brick storage2:/data/data-cluster     49152     0 Y
     23101
    Brick storage:/data/data-cluster      49152     0 Y
     30773
    Self-heal Daemon on localhost         N/A       N/A Y
     30050
    Self-heal Daemon on storage           N/A       N/A Y
     30792


    Any idea?

    On storage I have:
    Number of Peers: 1

    Hostname: 195.65.194.217
    Uuid: 7c988af2-9f76-4843-8e6f-d94866d57bb0
    State: Peer in Cluster (Connected)


    - Kindest regards,

    Milos Cuculovic
    IT Manager

    ---
    MDPI AG
    Postfach, CH-4020 Basel, Switzerland
    Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
    Tel. +41 61 683 77 35
    Fax +41 61 302 89 18
    Email: cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>
    Skype: milos.cuculovic.mdpi

    On 08.12.2016 13:55, Atin Mukherjee wrote:

        Can you resend the attachment as zip? I am unable to extract the
        content? We shouldn't have 0 info file. What does gluster peer
        status
        output say?

        On Thu, Dec 8, 2016 at 4:51 PM, Miloš Čučulović - MDPI
        <cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>
        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>> wrote:

            I hope you received my last email Atin, thank you!

            - Kindest regards,

            Milos Cuculovic
            IT Manager

            ---
            MDPI AG
            Postfach, CH-4020 Basel, Switzerland
            Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
            Tel. +41 61 683 77 35
            Fax +41 61 302 89 18
            Email: cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>
        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>
            Skype: milos.cuculovic.mdpi

            On 08.12.2016 10:28, Atin Mukherjee wrote:


                ---------- Forwarded message ----------
                From: *Atin Mukherjee* <amukherj@xxxxxxxxxx
        <mailto:amukherj@xxxxxxxxxx>
                <mailto:amukherj@xxxxxxxxxx
        <mailto:amukherj@xxxxxxxxxx>> <mailto:amukherj@xxxxxxxxxx
        <mailto:amukherj@xxxxxxxxxx>
                <mailto:amukherj@xxxxxxxxxx
        <mailto:amukherj@xxxxxxxxxx>>>>
                Date: Thu, Dec 8, 2016 at 11:56 AM
                Subject: Re: [Gluster-users] Replica brick not working
                To: Ravishankar N <ravishankar@xxxxxxxxxx
        <mailto:ravishankar@xxxxxxxxxx>
                <mailto:ravishankar@xxxxxxxxxx
        <mailto:ravishankar@xxxxxxxxxx>>
        <mailto:ravishankar@xxxxxxxxxx <mailto:ravishankar@xxxxxxxxxx>
                <mailto:ravishankar@xxxxxxxxxx
        <mailto:ravishankar@xxxxxxxxxx>>>>
                Cc: Miloš Čučulović - MDPI <cuculovic@xxxxxxxx
        <mailto:cuculovic@xxxxxxxx>
                <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>
                <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>
        <mailto:cuculovic@xxxxxxxx <mailto:cuculovic@xxxxxxxx>>>>,
                Pranith Kumar Karampuri
                <pkarampu@xxxxxxxxxx <mailto:pkarampu@xxxxxxxxxx>
        <mailto:pkarampu@xxxxxxxxxx <mailto:pkarampu@xxxxxxxxxx>>
                <mailto:pkarampu@xxxxxxxxxx
        <mailto:pkarampu@xxxxxxxxxx> <mailto:pkarampu@xxxxxxxxxx
        <mailto:pkarampu@xxxxxxxxxx>>>>,
                gluster-users
                <gluster-users@xxxxxxxxxxx
        <mailto:gluster-users@gluster.org>
        <mailto:gluster-users@gluster.org
        <mailto:gluster-users@gluster.org>>
                <mailto:gluster-users@gluster.org
        <mailto:gluster-users@gluster.org>
                <mailto:gluster-users@gluster.org
        <mailto:gluster-users@gluster.org>>>>




                On Thu, Dec 8, 2016 at 11:11 AM, Ravishankar N
                <ravishankar@xxxxxxxxxx
        <mailto:ravishankar@xxxxxxxxxx> <mailto:ravishankar@xxxxxxxxxx
        <mailto:ravishankar@xxxxxxxxxx>>
                <mailto:ravishankar@xxxxxxxxxx
        <mailto:ravishankar@xxxxxxxxxx> <mailto:ravishankar@xxxxxxxxxx
        <mailto:ravishankar@xxxxxxxxxx>>>>

                wrote:

                    On 12/08/2016 10:43 AM, Atin Mukherjee wrote:

                        >From the log snippet:

                        [2016-12-07 09:15:35.677645] I [MSGID: 106482]

        [glusterd-brick-ops.c:442:__glusterd_handle_add_brick]
                        0-management: Received add brick req
                        [2016-12-07 09:15:35.677708] I [MSGID: 106062]

        [glusterd-brick-ops.c:494:__glusterd_handle_add_brick]
                        0-management: replica-count is 2
                        [2016-12-07 09:15:35.677735] E [MSGID: 106291]

        [glusterd-brick-ops.c:614:__glusterd_handle_add_brick]
                0-management:

                        The last log entry indicates that we hit the
        code path in
                        gd_addbr_validate_replica_count ()

                                        if (replica_count ==
                volinfo->replica_count) {
                                                if (!(total_bricks %
                        volinfo->dist_leaf_count)) {
                                                        ret = 1;
                                                        goto out;
                        }
                                        }


                    It seems unlikely that this snippet was hit
        because we print
                the E
                    [MSGID: 106291] in the above message only if ret==-1.
                    gd_addbr_validate_replica_count() returns -1 and
        yet not
                populates
                    err_str only when in volinfo->type doesn't match
        any of the
                known
                    volume types, so volinfo->type is corrupted perhaps?


                You are right, I missed that ret is set to 1 here in
        the above
                snippet.

                @Milos - Can you please provide us the volume info
        file from
                /var/lib/glusterd/vols/<volname>/ from all the three
        nodes to
                continue
                the analysis?



                    -Ravi

                        @Pranith, Ravi - Milos was trying to convert a
        dist (1 X 1)
                        volume to a replicate (1 X 2) using add brick
        and hit
                this issue
                        where add-brick failed. The cluster is
        operating with 3.7.6.
                        Could you help on what scenario this code path
        can be
                hit? One
                        straight forward issue I see here is missing
        err_str in
                this path.






                --

                ~ Atin (atinm)



                --

                ~ Atin (atinm)




        --

        ~ Atin (atinm)




--

~ Atin (atinm)






--
Pranith



--
Pranith
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux