Re: Issue with Pro active self healing for Erasure coding

Xavier Hernandez <xhernandez@xxxxxxxxxx> · Wed, 27 May 2015 16:32:06 +0200

Hi again,

in today's gluster meeting [1] it has been decided that 3.7.1 will be 
released urgently to solve a bug in glusterd. All fixes planned for 
3.7.1 will be moved to 3.7.2 which will be released soon after.

Xavi

[1] 
http://meetbot.fedoraproject.org/gluster-meeting/2015-05-27/gluster-meeting.2015-05-27-12.01.html

On 05/27/2015 12:01 PM, Xavier Hernandez wrote:
On 05/27/2015 11:26 AM, Mohamed Pakkeer wrote:
Hi Xavier,

Thanks for your reply. When can we expect the 3.7.1 release?

AFAIK a beta of 3.7.1 will be released very soon.

cheers
Backer

On Wed, May 27, 2015 at 1:22 PM, Xavier Hernandez <xhernandez@xxxxxxxxxx
<mailto:xhernandez@xxxxxxxxxx>> wrote:

    Hi,

    some Input/Output error issues have been identified and fixed. These
    fixes will be available on 3.7.1.

    Xavi

    On 05/26/2015 10:15 AM, Mohamed Pakkeer wrote:

        Hi Glusterfs Experts,

        We are testing glusterfs 3.7.0 tarball on our 10 Node glusterfs
        cluster.
        Each node has 36 dirves and please find the volume info below

        Volume Name: vaulttest5
        Type: Distributed-Disperse
        Volume ID: 68e082a6-9819-4885-856c-1510cd201bd9
        Status: Started
        Number of Bricks: 36 x (8 + 2) = 360
        Transport-type: tcp
        Bricks:
        Brick1: 10.1.2.1:/media/disk1
        Brick2: 10.1.2.2:/media/disk1
        Brick3: 10.1.2.3:/media/disk1
        Brick4: 10.1.2.4:/media/disk1
        Brick5: 10.1.2.5:/media/disk1
        Brick6: 10.1.2.6:/media/disk1
        Brick7: 10.1.2.7:/media/disk1
        Brick8: 10.1.2.8:/media/disk1
        Brick9: 10.1.2.9:/media/disk1
        Brick10: 10.1.2.10:/media/disk1
        Brick11: 10.1.2.1:/media/disk2
        Brick12: 10.1.2.2:/media/disk2
        Brick13: 10.1.2.3:/media/disk2
        Brick14: 10.1.2.4:/media/disk2
        Brick15: 10.1.2.5:/media/disk2
        Brick16: 10.1.2.6:/media/disk2
        Brick17: 10.1.2.7:/media/disk2
        Brick18: 10.1.2.8:/media/disk2
        Brick19: 10.1.2.9:/media/disk2
        Brick20: 10.1.2.10:/media/disk2
        ...
        ....
        Brick351: 10.1.2.1:/media/disk36
        Brick352: 10.1.2.2:/media/disk36
        Brick353: 10.1.2.3:/media/disk36
        Brick354: 10.1.2.4:/media/disk36
        Brick355: 10.1.2.5:/media/disk36
        Brick356: 10.1.2.6:/media/disk36
        Brick357: 10.1.2.7:/media/disk36
        Brick358: 10.1.2.8:/media/disk36
        Brick359: 10.1.2.9:/media/disk36
        Brick360: 10.1.2.10:/media/disk36
        Options Reconfigured:
        performance.readdir-ahead: on

        We did some performance testing and simulated the proactive self
        healing
        for Erasure coding. Disperse volume has been created across
nodes.

        _*Description of problem*_

        I disconnected the *network of two nodes* and tried to write
        some video
        files and *glusterfs* *wrote the video files on balance 8 nodes
        perfectly*. I tried to download the uploaded file and it was
        downloaded
        perfectly. Then i enabled the network of two nodes, the pro
        active self
        healing mechanism worked perfectly and wrote the unavailable
junk of
        data to the recently enabled node from the other 8 nodes. But
when i
        tried to download the same file node, it showed Input/Output
        error. I
        couldn't download the file. I think there is an issue in pro
        active self
        healing.

        Also we tried the simulation with one node network failure. We
faced
        same I/O error issue while downloading the file

        _Error while downloading file _
        _
        _

        root@master02:/home/admin# rsync -r --progress
        /mnt/gluster/file13_AN
        ./1/file13_AN-2

        sending incremental file list

        file13_AN

            3,342,355,597 100% 4.87MB/s    0:10:54 (xfr#1, to-chk=0/1)

        rsync: read errors mapping "/mnt/gluster/file13_AN":
        Input/output error (5)

        WARNING: file13_AN failed verification -- update discarded (will
        try again).

           root@master02:/home/admin# cp /mnt/gluster/file13_AN
        ./1/file13_AN-3

        cp: error reading ‘/mnt/gluster/file13_AN’: Input/output error

        cp: failed to extend ‘./1/file13_AN-3’: Input/output error_
        _

        We can't conclude the issue with glusterfs 3.7.0 or our glusterfs
        configuration.

        Any help would be greatly appreciated

        --
        Cheers
        Backer

        _______________________________________________
        Gluster-users mailing list
        Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
        http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users