Re: Rebalance + VM corruption - current status and request for feedback

Shyam <srangana@xxxxxxxxxx> · Mon, 5 Jun 2017 09:13:30 -0400

Just to be clear, the release notes still carry the warning about this, 
and the code to use force when doing rebalance is still in place.

As we have received the feedback that this works, these will be removed 
in the subsequent minor release for the various streams as appropriate.

Thanks,
Shyam

On 06/05/2017 07:36 AM, Gandalf Corvotempesta wrote:
Great, thanks!

Il 5 giu 2017 6:49 AM, "Krutika Dhananjay" <kdhananj@xxxxxxxxxx
<mailto:kdhananj@xxxxxxxxxx>> ha scritto:

    The fixes are already available in 3.10.2, 3.8.12 and 3.11.0

    -Krutika

    On Sun, Jun 4, 2017 at 5:30 PM, Gandalf Corvotempesta
    <gandalf.corvotempesta@xxxxxxxxx
    <mailto:gandalf.corvotempesta@xxxxxxxxx>> wrote:

        Great news.
        Is this planned to be published in next release?

        Il 29 mag 2017 3:27 PM, "Krutika Dhananjay" <kdhananj@xxxxxxxxxx
        <mailto:kdhananj@xxxxxxxxxx>> ha scritto:

            Thanks for that update. Very happy to hear it ran fine
            without any issues. :)

            Yeah so you can ignore those 'No such file or directory'
            errors. They represent a transient state where DHT in the
            client process is yet to figure out the new location of the
            file.

            -Krutika

            On Mon, May 29, 2017 at 6:51 PM, Mahdi Adnan
            <mahdi.adnan@xxxxxxxxxxx <mailto:mahdi.adnan@xxxxxxxxxxx>>
            wrote:

                Hello,

                Yes, i forgot to upgrade the client as well.

                I did the upgrade and created a new volume, same options
                as before, with one VM running and doing lots of IOs. i
                started the rebalance with force and after it completed
                the process i rebooted the VM, and it did start normally
                without issues.

                I repeated the process and did another rebalance while
                the VM running and everything went fine.

                But the logs in the client throwing lots of warning
                messages:

                [2017-05-29 13:14:59.416382] W [MSGID: 114031]
                [client-rpc-fops.c:2928:client3_3_lookup_cbk]
                2-gfs_vol2-client-2: remote operation failed. Path:
                /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
                (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or
                directory]
                [2017-05-29 13:14:59.416427] W [MSGID: 114031]
                [client-rpc-fops.c:2928:client3_3_lookup_cbk]
                2-gfs_vol2-client-3: remote operation failed. Path:
                /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
                (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or
                directory]
                [2017-05-29 13:14:59.808251] W [MSGID: 114031]
                [client-rpc-fops.c:2928:client3_3_lookup_cbk]
                2-gfs_vol2-client-2: remote operation failed. Path:
                /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
                (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or
                directory]
                [2017-05-29 13:14:59.808287] W [MSGID: 114031]
                [client-rpc-fops.c:2928:client3_3_lookup_cbk]
                2-gfs_vol2-client-3: remote operation failed. Path:
                /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
                (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or
                directory]

                Although the process went smooth, i will run another
                extensive test tomorrow just to be sure.

                --

                Respectfully*
                **Mahdi A. Mahdi*

                ------------------------------------------------------------------------
                *From:* Krutika Dhananjay <kdhananj@xxxxxxxxxx
                <mailto:kdhananj@xxxxxxxxxx>>
                *Sent:* Monday, May 29, 2017 9:20:29 AM

                *To:* Mahdi Adnan
                *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay
                Mathieson; Kevin Lemonnier
                *Subject:* Re: Rebalance + VM corruption - current
                status and request for feedback

                Hi,

                I took a look at your logs.
                It very much seems like an issue that is caused by a
                mismatch in glusterfs client and server packages.
                So your client (mount) seems to be still running 3.7.20,
                as confirmed by the occurrence of the following log message:

                [2017-05-26 08:58:23.647458] I [MSGID: 100030]
                [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfs: Started
                running /usr/sbin/glusterfs version 3.7.20 (args:
                /usr/sbin/glusterfs --volfile-server=s1
                --volfile-server=s2 --volfile-server=s3
                --volfile-server=s4 --volfile-id=/testvol
                /rhev/data-center/mnt/glusterSD/s1:_testvol)
                [2017-05-26 08:58:40.901204] I [MSGID: 100030]
                [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfs: Started
                running /usr/sbin/glusterfs version 3.7.20 (args:
                /usr/sbin/glusterfs --volfile-server=s1
                --volfile-server=s2 --volfile-server=s3
                --volfile-server=s4 --volfile-id=/testvol
                /rhev/data-center/mnt/glusterSD/s1:_testvol)
                [2017-05-26 08:58:48.923452] I [MSGID: 100030]
                [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfs: Started
                running /usr/sbin/glusterfs version 3.7.20 (args:
                /usr/sbin/glusterfs --volfile-server=s1
                --volfile-server=s2 --volfile-server=s3
                --volfile-server=s4 --volfile-id=/testvol
                /rhev/data-center/mnt/glusterSD/s1:_testvol)

                whereas the servers have rightly been upgraded to
                3.10.2, as seen in rebalance log:

                [2017-05-26 09:36:36.075940] I [MSGID: 100030]
                [glusterfsd.c:2475:main] 0-/usr/sbin/glusterfs: Started
                running /usr/sbin/glusterfs version 3.10.2 (args:
                /usr/sbin/glusterfs -s localhost --volfile-id
                rebalance/testvol --xlator-option *dht.use-readdirp=yes
                --xlator-option *dht.lookup-unhashed=yes --xlator-option
                *dht.assert-no-child-down=yes --xlator-option
                *replicate*.data-self-heal=off --xlator-option
                *replicate*.metadata-self-heal=off --xlator-option
                *replicate*.entry-self-heal=off --xlator-option
                *dht.readdir-optimize=on --xlator-option
                *dht.rebalance-cmd=5 --xlator-option
                *dht.node-uuid=7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b
                --xlator-option *dht.commit-hash=3376396580
                <tel:(337)%20639-6580> --socket-file
                /var/run/gluster/gluster-rebalance-801faefa-a583-46b4-8eef-e0ec160da9ea.sock
                --pid-file
                /var/lib/glusterd/vols/testvol/rebalance/7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b.pid
                -l /var/log/glusterfs/testvol-rebalance.log)

                Could you upgrade all packages to 3.10.2 and try again?

                -Krutika

                On Fri, May 26, 2017 at 4:46 PM, Mahdi Adnan
                <mahdi.adnan@xxxxxxxxxxx
                <mailto:mahdi.adnan@xxxxxxxxxxx>> wrote:

                    Hi,

                    Attached are the logs for both the rebalance and the
                    mount.

                    --

                    Respectfully*
                    **Mahdi A. Mahdi*

                    ------------------------------------------------------------------------
                    *From:* Krutika Dhananjay <kdhananj@xxxxxxxxxx
                    <mailto:kdhananj@xxxxxxxxxx>>
                    *Sent:* Friday, May 26, 2017 1:12:28 PM
                    *To:* Mahdi Adnan
                    *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay
                    Mathieson; Kevin Lemonnier
                    *Subject:* Re: Rebalance + VM corruption - current
                    status and request for feedback

                    Could you provide the rebalance and mount logs?

                    -Krutika

                    On Fri, May 26, 2017 at 3:17 PM, Mahdi Adnan
                    <mahdi.adnan@xxxxxxxxxxx
                    <mailto:mahdi.adnan@xxxxxxxxxxx>> wrote:

                        Good morning,

                        So i have tested the new Gluster 3.10.2, and
                        after starting rebalance two VMs were paused due
                        to storage error and third one was not responding.

                        After rebalance completed i started the VMs and
                        it did not boot, and throw an XFS wrong inode
                        error into the screen.

                        My setup:

                        4 nodes running CentOS7.3 with Gluster 3.10.2

                        4 bricks in distributed replica with group set
                        to virt.

                        I added the volume to ovirt and created three
                        VMs, i ran a loop to create 5GB file inside the VMs.

                        Added new 4 bricks to the existing nodes.

                        Started rebalane "with force to bypass the
                        warning message"

                        VMs started to fail after rebalancing.

                        --

                        Respectfully*
                        **Mahdi A. Mahdi*

                        ------------------------------------------------------------------------
                        *From:* Krutika Dhananjay <kdhananj@xxxxxxxxxx
                        <mailto:kdhananj@xxxxxxxxxx>>
                        *Sent:* Wednesday, May 17, 2017 6:59:20 AM
                        *To:* gluster-user
                        *Cc:* Gandalf Corvotempesta; Lindsay Mathieson;
                        Kevin Lemonnier; Mahdi Adnan
                        *Subject:* Rebalance + VM corruption - current
                        status and request for feedback

                        Hi,

                        In the past couple of weeks, we've sent the
                        following fixes concerning VM corruption upon
                        doing rebalance -
                        https://review.gluster.org/#/q/status:merged+project:glusterfs+branch:master+topic:bug-1440051
                        <https://review.gluster.org/#/q/status:merged+project:glusterfs+branch:master+topic:bug-1440051>

                        These fixes are very much part of the latest
                        3.10.2 release.

                        Satheesaran within Red Hat also verified that
                        they work and he's not seeing corruption issues
                        anymore.

                        I'd like to hear feedback from the users
                        themselves on these fixes (on your test
                        environments to begin with) before even changing
                        the status of the bug to CLOSED.

                        Although 3.10.2 has a patch that prevents
                        rebalance sub-commands from being executed on
                        sharded volumes, you can override the check by
                        using the 'force' option.

                        For example,

                        # gluster volume rebalance myvol start force

                        Very much looking forward to hearing from you all.

                        Thanks,
                        Krutika

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users