Re: Issue with Pro active self healing for Erasure coding

Mohamed Pakkeer <mdfakkeer@xxxxxxxxx> · Fri, 26 Jun 2015 12:28:47 +0530

Hi Xavier
We are facing same I/O error after upgrade into gluster 3.7.2.

Description of problem:
=======================
In a 3 x (4 + 2) = 18 distributed disperse volume, there are input/output error of some files on fuse mount after simulating the following scenario

1.   Simulate the disk failure by killing the disk pid and again adding the same disk after formatting the drive 
2.   Try to read the recovered or healed file after 2 bricks/nodes were brought down 
Version-Release number of selected component (if applicable):
==============================================================
admin@node001:~$ sudo gluster --version
glusterfs 3.7.2 built on Jun 19 2015 16:33:27
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (coffee) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.

Steps to Reproduce:
1. create a 3x(4+2) disperse volume across nodes
2. FUSE mount on the client and start creating files/directories with mkdir and rsync/dd
3. simulate the disk failure by killing pid of any disk on one node and add again the same disk after formatting the drive 
4. start volume by force
5. self haling adding the file name with 0 bytes in newly formatted drive
6. wait more time to finish self healing, but self healing is not happening the file lies on 0 bytes
7. Try to read same file from client, now the file name with 0 byte try to recovery and recovery completed. Get the md5sum of the file with all client live and the result is positive
8. Now, bring down 2 of the node 
9. Now try to get the mdsum of same recoverd file, client throws I/O error
Screen shots

admin@node001:~$ sudo gluster volume info

Volume Name: vaulttest21
Type: Distributed-Disperse
Volume ID: ac6a374d-a0a2-405c-823d-0672fd92f0af
Status: Started
Number of Bricks: 3 x (4 + 2) = 18
Transport-type: tcp
Bricks:
Brick1: 10.1.2.1:/media/disk1
Brick2: 10.1.2.2:/media/disk1
Brick3: 10.1.2.3:/media/disk1
Brick4: 10.1.2.4:/media/disk1
Brick5: 10.1.2.5:/media/disk1
Brick6: 10.1.2.6:/media/disk1
Brick7: 10.1.2.1:/media/disk2
Brick8: 10.1.2.2:/media/disk2
Brick9: 10.1.2.3:/media/disk2
Brick10: 10.1.2.4:/media/disk2
Brick11: 10.1.2.5:/media/disk2
Brick12: 10.1.2.6:/media/disk2
Brick13: 10.1.2.1:/media/disk3
Brick14: 10.1.2.2:/media/disk3
Brick15: 10.1.2.3:/media/disk3
Brick16: 10.1.2.4:/media/disk3
Brick17: 10.1.2.5:/media/disk3
Brick18: 10.1.2.6:/media/disk3
Options Reconfigured:
performance.readdir-ahead: on

After simulated the disk failure( node3- disk2) and adding aging by formatting the drive 

admin@node003:~$ date

Thu Jun 25 16:21:58 IST 2015

admin@node003:~$ ls -l -h 
/media/disk2

total 1.6G

drwxr-xr-x 3 root root   22 Jun 25
16:18 1

-rw-r--r-- 2 root root    0 Jun 25
16:17 up1

-rw-r--r-- 2 root root    0 Jun 25
16:17 up2

-rw-r--r-- 2 root root 797M Jun 25 16:03 up3

-rw-r--r-- 2 root root 797M Jun 25 16:04 up4

--

admin@node003:~$ date

Thu Jun 25 16:25:09 IST 2015

admin@node003:~$ ls
-l -h  /media/disk2

total 1.6G

drwxr-xr-x 3 root root   22 Jun 25
16:18 1

-rw-r--r-- 2 root root    0 Jun 25
16:17 up1

-rw-r--r-- 2 root root    0 Jun 25
16:17 up2

-rw-r--r-- 2 root root 797M Jun 25 16:03 up3

-rw-r--r-- 2 root root 797M
Jun 25 16:04 up4

admin@node003:~$ date

Thu Jun 25
16:41:25 IST 2015

admin@node003:~$
 ls -l -h  /media/disk2

total 1.6G

drwxr-xr-x 3
root root   22 Jun 25 16:18 1

-rw-r--r-- 2
root root    0 Jun 25 16:17 up1

-rw-r--r-- 2
root root    0 Jun 25 16:17 up2

-rw-r--r-- 2
root root 797M Jun 25 16:03 up3

-rw-r--r-- 2
root root 797M Jun 25 16:04 up4

after waiting nearly 20 minutes, self healing is not recovered the full data junk . Then try to read the file using md5sum

root@mas03:/mnt/gluster# time md5sum up1
4650543ade404ed5a1171726e76f8b7c  up1

real    1m58.010s
user    0m6.243s
sys     0m0.778s

corrupted junk starts growing

admin@node003:~$ ls -l -h  /media/disk2
total 2.6G
drwxr-xr-x 3 root root   22 Jun 25 16:18 1
-rw-r--r-- 2 root root 797M Jun 25 15:57 up1
-rw-r--r-- 2 root root    0 Jun 25 16:17 up2
-rw-r--r-- 2 root root 797M Jun 25 16:03 up3
-rw-r--r-- 2 root root 797M Jun 25 16:04 up4

To verify healed file after two node 5 & 6 taken offline

root@mas03:/mnt/gluster# time md5sum up1
md5sum: up1: Input/output error

Still the I/O error is not rectified. Could you suggest, if any thing wrong on our testing?

admin@node001:~$ sudo gluster volume get vaulttest21 all
Option                                  Value
------                                  -----
cluster.lookup-unhashed                 on
cluster.lookup-optimize                 off
cluster.min-free-disk                   10%
cluster.min-free-inodes                 5%
cluster.rebalance-stats                 off
cluster.subvols-per-directory           (null)
cluster.readdir-optimize                off
cluster.rsync-hash-regex                (null)
cluster.extra-hash-regex                (null)
cluster.dht-xattr-name                  trusted.glusterfs.dht
cluster.randomize-hash-range-by-gfid    off
cluster.rebal-throttle                  normal
cluster.local-volume-name               (null)
cluster.weighted-rebalance              on
cluster.entry-change-log                on
cluster.read-subvolume                  (null)
cluster.read-subvolume-index            -1
cluster.read-hash-mode                  1
cluster.background-self-heal-count      16
cluster.metadata-self-heal              on
cluster.data-self-heal                  on
cluster.entry-self-heal                 on
cluster.self-heal-daemon                on
cluster.heal-timeout                    600
cluster.self-heal-window-size           1
cluster.data-change-log                 on
cluster.metadata-change-log             on
cluster.data-self-heal-algorithm        (null)
cluster.eager-lock                      on
cluster.quorum-type                     none
cluster.quorum-count                    (null)
cluster.choose-local                    true
cluster.self-heal-readdir-size          1KB
cluster.post-op-delay-secs              1
cluster.ensure-durability               on
cluster.consistent-metadata             no
cluster.stripe-block-size               128KB
cluster.stripe-coalesce                 true
diagnostics.latency-measurement         off
diagnostics.dump-fd-stats               off
diagnostics.count-fop-hits              off
diagnostics.brick-log-level             INFO
diagnostics.client-log-level            INFO
diagnostics.brick-sys-log-level         CRITICAL
diagnostics.client-sys-log-level        CRITICAL
diagnostics.brick-logger                (null)
diagnostics.client-logger               (null)
diagnostics.brick-log-format            (null)
diagnostics.client-log-format           (null)
diagnostics.brick-log-buf-size          5
diagnostics.client-log-buf-size         5
diagnostics.brick-log-flush-timeout     120
diagnostics.client-log-flush-timeout    120
performance.cache-max-file-size         0
performance.cache-min-file-size         0
performance.cache-refresh-timeout       1
performance.cache-priority
performance.cache-size                  32MB
performance.io-thread-count             16
performance.high-prio-threads           16
performance.normal-prio-threads         16
performance.low-prio-threads            16
performance.least-prio-threads          1
performance.enable-least-priority       on
performance.least-rate-limit            0
performance.cache-size                  128MB
performance.flush-behind                on
performance.nfs.flush-behind            on
performance.write-behind-window-size    1MB
performance.nfs.write-behind-window-size1MB
performance.strict-o-direct             off
performance.nfs.strict-o-direct         off
performance.strict-write-ordering       off
performance.nfs.strict-write-ordering   off
performance.lazy-open                   yes
performance.read-after-open             no
performance.read-ahead-page-count       4
performance.md-cache-timeout            1
features.encryption                     off
encryption.master-key                   (null)
encryption.data-key-size                256
encryption.block-size                   4096
network.frame-timeout                   1800
network.ping-timeout                    42
network.tcp-window-size                 (null)
features.lock-heal                      off
features.grace-timeout                  10
network.remote-dio                      disable
client.event-threads                    2
network.ping-timeout                    42
network.tcp-window-size                 (null)
network.inode-lru-limit                 16384
auth.allow                              *
auth.reject                             (null)
transport.keepalive                     (null)
server.allow-insecure                   (null)
server.root-squash                      off
server.anonuid                          65534
server.anongid                          65534
server.statedump-path                   /var/run/gluster
server.outstanding-rpc-limit            64
features.lock-heal                      off
features.grace-timeout                  (null)
server.ssl                              (null)
auth.ssl-allow                          *
server.manage-gids                      off
client.send-gids                        on
server.gid-timeout                      300
server.own-thread                       (null)
server.event-threads                    2
performance.write-behind                on
performance.read-ahead                  on
performance.readdir-ahead               on
performance.io-cache                    on
performance.quick-read                  on
performance.open-behind                 on
performance.stat-prefetch               on
performance.client-io-threads           off
performance.nfs.write-behind            on
performance.nfs.read-ahead              off
performance.nfs.io-cache                off
performance.nfs.quick-read              off
performance.nfs.stat-prefetch           off
performance.nfs.io-threads              off
performance.force-readdirp              true
features.file-snapshot                  off
features.uss                            off
features.snapshot-directory             .snaps
features.show-snapshot-directory        off
network.compression                     off
network.compression.window-size         -15
network.compression.mem-level           8
network.compression.min-size            0
network.compression.compression-level   -1
network.compression.debug               false
features.limit-usage                    (null)
features.quota-timeout                  0
features.default-soft-limit             80%
features.soft-timeout                   60
features.hard-timeout                   5
features.alert-time                     86400
features.quota-deem-statfs              off
geo-replication.indexing                off
geo-replication.indexing                off
geo-replication.ignore-pid-check        off
geo-replication.ignore-pid-check        off
features.quota                          off
features.inode-quota                    off
features.bitrot                         disable
debug.trace                             off
debug.log-history                       no
debug.log-file                          no
debug.exclude-ops                       (null)
debug.include-ops                       (null)
debug.error-gen                         off
debug.error-failure                     (null)
debug.error-number                      (null)
debug.random-failure                    off
debug.error-fops                        (null)
nfs.enable-ino32                        no
nfs.mem-factor                          15
nfs.export-dirs                         on
nfs.export-volumes                      on
nfs.addr-namelookup                     off
nfs.dynamic-volumes                     off
nfs.register-with-portmap               on
nfs.outstanding-rpc-limit               16
nfs.port                                2049
nfs.rpc-auth-unix                       on
nfs.rpc-auth-null                       on
nfs.rpc-auth-allow                      all
nfs.rpc-auth-reject                     none
nfs.ports-insecure                      off
nfs.trusted-sync                        off
nfs.trusted-write                       off
nfs.volume-access                       read-write
nfs.export-dir
nfs.disable                             false
nfs.nlm                                 on
nfs.acl                                 on
nfs.mount-udp                           off
nfs.mount-rmtab                         /var/lib/glusterd/nfs/rmtab
nfs.rpc-statd                           /sbin/rpc.statd
nfs.server-aux-gids                     off
nfs.drc                                 off
nfs.drc-size                            0x20000
nfs.read-size                           (1 * 1048576ULL)
nfs.write-size                          (1 * 1048576ULL)
nfs.readdir-size                        (1 * 1048576ULL)
nfs.exports-auth-enable                 (null)
nfs.auth-refresh-interval-sec           (null)
nfs.auth-cache-ttl-sec                  (null)
features.read-only                      off
features.worm                           off
storage.linux-aio                       off
storage.batch-fsync-mode                reverse-fsync
storage.batch-fsync-delay-usec          0
storage.owner-uid                       -1
storage.owner-gid                       -1
storage.node-uuid-pathinfo              off
storage.health-check-interval           30
storage.build-pgfid                     off
storage.bd-aio                          off
cluster.server-quorum-type              off
cluster.server-quorum-ratio             0
changelog.changelog                     off
changelog.changelog-dir                 (null)
changelog.encoding                      ascii
changelog.rollover-time                 15
changelog.fsync-interval                5
changelog.changelog-barrier-timeout     120
changelog.capture-del-path              off
features.barrier                        disable
features.barrier-timeout                120
features.trash                          off
features.trash-dir                      .trashcan
features.trash-eliminate-path           (null)
features.trash-max-filesize             5MB
features.trash-internal-op              off
cluster.enable-shared-storage           disable
features.ctr-enabled                    off
features.record-counters                off
features.ctr_link_consistency           off
locks.trace                             (null)
cluster.disperse-self-heal-daemon       enable
cluster.quorum-reads                    no
client.bind-insecure                    (null)
ganesha.enable                          off
features.shard                          off
features.shard-block-size               4MB
features.scrub-throttle                 lazy
features.scrub-freq                     biweekly
features.expiry-time                    120
features.cache-invalidation             off
features.cache-invalidation-timeout     60

Thanks & regards
Backer

On Mon, Jun 15, 2015 at 1:26 PM, Xavier Hernandez <xhernandez@xxxxxxxxxx> wrote:
On 06/15/2015 09:25 AM, Mohamed Pakkeer wrote:

Hi Xavier,

When can we expect the 3.7.2 release for fixing the I/O error which we

discussed on this mail thread?.

As per the latest meeting held last wednesday [1] it will be released this week.

Xavi

[1] http://meetbot.fedoraproject.org/gluster-meeting/2015-06-10/gluster-meeting.2015-06-10-12.01.html

Thanks

Backer

On Wed, May 27, 2015 at 8:02 PM, Xavier Hernandez <xhernandez@xxxxxxxxxx

<mailto:xhernandez@xxxxxxxxxx>> wrote:

    Hi again,

    in today's gluster meeting [1] it has been decided that 3.7.1 will

    be released urgently to solve a bug in glusterd. All fixes planned

    for 3.7.1 will be moved to 3.7.2 which will be released soon after.

    Xavi

    [1]

    http://meetbot.fedoraproject.org/gluster-meeting/2015-05-27/gluster-meeting.2015-05-27-12.01.html

    On 05/27/2015 12:01 PM, Xavier Hernandez wrote:

        On 05/27/2015 11:26 AM, Mohamed Pakkeer wrote:

            Hi Xavier,

            Thanks for your reply. When can we expect the 3.7.1 release?

        AFAIK a beta of 3.7.1 will be released very soon.

            cheers

            Backer

            On Wed, May 27, 2015 at 1:22 PM, Xavier Hernandez

            <xhernandez@xxxxxxxxxx <mailto:xhernandez@xxxxxxxxxx>

            <mailto:xhernandez@xxxxxxxxxx

            <mailto:xhernandez@xxxxxxxxxx>>> wrote:

                 Hi,

                 some Input/Output error issues have been identified and

            fixed. These

                 fixes will be available on 3.7.1.

                 Xavi

                 On 05/26/2015 10:15 AM, Mohamed Pakkeer wrote:

                     Hi Glusterfs Experts,

                     We are testing glusterfs 3.7.0 tarball on our 10

            Node glusterfs

                     cluster.

                     Each node has 36 dirves and please find the volume

            info below

                     Volume Name: vaulttest5

                     Type: Distributed-Disperse

                     Volume ID: 68e082a6-9819-4885-856c-1510cd201bd9

                     Status: Started

                     Number of Bricks: 36 x (8 + 2) = 360

                     Transport-type: tcp

                     Bricks:

                     Brick1: 10.1.2.1:/media/disk1

                     Brick2: 10.1.2.2:/media/disk1

                     Brick3: 10.1.2.3:/media/disk1

                     Brick4: 10.1.2.4:/media/disk1

                     Brick5: 10.1.2.5:/media/disk1

                     Brick6: 10.1.2.6:/media/disk1

                     Brick7: 10.1.2.7:/media/disk1

                     Brick8: 10.1.2.8:/media/disk1

                     Brick9: 10.1.2.9:/media/disk1

                     Brick10: 10.1.2.10:/media/disk1

                     Brick11: 10.1.2.1:/media/disk2

                     Brick12: 10.1.2.2:/media/disk2

                     Brick13: 10.1.2.3:/media/disk2

                     Brick14: 10.1.2.4:/media/disk2

                     Brick15: 10.1.2.5:/media/disk2

                     Brick16: 10.1.2.6:/media/disk2

                     Brick17: 10.1.2.7:/media/disk2

                     Brick18: 10.1.2.8:/media/disk2

                     Brick19: 10.1.2.9:/media/disk2

                     Brick20: 10.1.2.10:/media/disk2

                     ...

                     ....

                     Brick351: 10.1.2.1:/media/disk36

                     Brick352: 10.1.2.2:/media/disk36

                     Brick353: 10.1.2.3:/media/disk36

                     Brick354: 10.1.2.4:/media/disk36

                     Brick355: 10.1.2.5:/media/disk36

                     Brick356: 10.1.2.6:/media/disk36

                     Brick357: 10.1.2.7:/media/disk36

                     Brick358: 10.1.2.8:/media/disk36

                     Brick359: 10.1.2.9:/media/disk36

                     Brick360: 10.1.2.10:/media/disk36

                     Options Reconfigured:

                     performance.readdir-ahead: on

                     We did some performance testing and simulated the

            proactive self

                     healing

                     for Erasure coding. Disperse volume has been

            created across

            nodes.

                     _*Description of problem*_

                     I disconnected the *network of two nodes* and tried

            to write

                     some video

                     files and *glusterfs* *wrote the video files on

            balance 8 nodes

                     perfectly*. I tried to download the uploaded file

            and it was

                     downloaded

                     perfectly. Then i enabled the network of two nodes,

            the pro

                     active self

                     healing mechanism worked perfectly and wrote the

            unavailable

            junk of

                     data to the recently enabled node from the other 8

            nodes. But

            when i

                     tried to download the same file node, it showed

            Input/Output

                     error. I

                     couldn't download the file. I think there is an

            issue in pro

                     active self

                     healing.

                     Also we tried the simulation with one node network

            failure. We

            faced

                     same I/O error issue while downloading the file

                     _Error while downloading file _

                     _

                     _

                     root@master02:/home/admin# rsync -r --progress

                     /mnt/gluster/file13_AN

                     ./1/file13_AN-2

                     sending incremental file list

                     file13_AN

                         3,342,355,597 100% 4.87MB/s    0:10:54 (xfr#1,

            to-chk=0/1)

                     rsync: read errors mapping "/mnt/gluster/file13_AN":

                     Input/output error (5)

                     WARNING: file13_AN failed verification -- update

            discarded (will

                     try again).

                        root@master02:/home/admin# cp /mnt/gluster/file13_AN

                     ./1/file13_AN-3

                     cp: error reading ‘/mnt/gluster/file13_AN’:

            Input/output error

                     cp: failed to extend ‘./1/file13_AN-3’:

            Input/output error_

                     _

                     We can't conclude the issue with glusterfs 3.7.0 or

            our glusterfs

                     configuration.

                     Any help would be greatly appreciated

                     --

                     Cheers

                     Backer

                     _______________________________________________

                     Gluster-users mailing list

            Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>

            <mailto:Gluster-users@xxxxxxxxxxx

            <mailto:Gluster-users@xxxxxxxxxxx>>

            http://www.gluster.org/mailman/listinfo/gluster-users

        _______________________________________________

        Gluster-users mailing list

        Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>

        http://www.gluster.org/mailman/listinfo/gluster-users

-- 
Thanks & Regards    
K.Mohamed Pakkeer
Mobile- 0091-8754410114

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users