Re: Distributed re-balance issue

Nithya Balachandran <nbalacha@xxxxxxxxxx> · Wed, 24 May 2017 21:15:10 +0530

On 24 May 2017 at 20:02, Mohammed Rafi K C <rkavunga@xxxxxxxxxx> wrote:

    On 05/23/2017 08:53 PM, Mahdi Adnan
      wrote:

        Hi,

        I have a distributed volume with 6 bricks, each have 5TB and
          it's hosting large qcow2 VM disks (I know it's reliable but
          it's not important data)
        I started with 5 bricks and then added another one, started
          the re balance process, everything went well, but now im
          looking at the bricks free space and i found one brick is
          around 82% while others ranging from 20% to 60%.
        The brick with highest utilization is hosting more qcow2 disk
          than other bricks, and whenever i start re balance it just
          complete in 0 seconds and without moving any data.

    How much is your average file size in the cluster? And number of
    files (roughly) .

        What will happen with the brick became full ?

    Once brick contents goes beyond 90%, new files won't be created in
    the brick. But existing files can grow.

        Can i move data manually from one brick to the other ?

    Nop.It is not recommended, even though gluster will try to find the
    file, it may break.

        Why re balance not distributing data evenly on all bricks ? 

    Rebalance works based on layout, so we need to see how layouts are
    distributed. If one of your bricks has higher capacity, it will have
    larger layout.

That is correct. As Rafi said, the layout matters here. Can you please send across all the rebalance logs from all the 6 nodes?

        Nodes runing CentOS 7.3
        Gluster 3.8.11

        Volume info;
        Volume Name: ctvvols
        Type: Distribute
        Volume ID: 1ecea912-510f-4079-b437-7398e9caa0eb
        Status: Started
        Snapshot Count: 0
        Number of Bricks: 6
        Transport-type: tcp
        Bricks:
        Brick1: ctv01:/vols/ctvvols
        Brick2: ctv02:/vols/ctvvols
        Brick3: ctv03:/vols/ctvvols
        Brick4: ctv04:/vols/ctvvols
        Brick5: ctv05:/vols/ctvvols
        Brick6: ctv06:/vols/ctvvols
        Options Reconfigured:
        nfs.disable: on
        performance.readdir-ahead: on
        transport.address-family: inet
        performance.quick-read: off
        performance.read-ahead: off
        performance.io-cache: off
        performance.stat-prefetch: off
        performance.low-prio-threads: 32
        network.remote-dio: enable
        cluster.eager-lock: enable
        cluster.quorum-type: none
        cluster.server-quorum-type: server
        cluster.data-self-heal-algorithm: full
        cluster.locking-scheme: granular
        cluster.shd-max-threads: 8
        cluster.shd-wait-qlength: 10000
        features.shard: off
        user.cifs: off
        network.ping-timeout: 10
        storage.owner-uid: 36
        storage.owner-gid: 36

        re balance log:

        [2017-05-23 14:45:12.637671] I
          [dht-rebalance.c:2866:gf_defrag_process_dir] 0-ctvvols-dht:
          Migration operation on dir
/31e0b341-4eeb-4b71-b280-840eba7d6940/images/690c728d-a83e-4c79-ac7d-1f3f17edf7f0
          took 0.00 secs
        [2017-05-23 14:45:12.640043] I [MSGID: 109081]
          [dht-common.c:4202:dht_setxattr] 0-ctvvols-dht: fixing the
          layout of
/31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
        [2017-05-23 14:45:12.641516] I
          [dht-rebalance.c:2652:gf_defrag_process_dir] 0-ctvvols-dht:
          migrate data called on
/31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
        [2017-05-23 14:45:12.642421] I
          [dht-rebalance.c:2866:gf_defrag_process_dir] 0-ctvvols-dht:
          Migration operation on dir
/31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
          took 0.00 secs
        [2017-05-23 14:45:12.645610] I [MSGID: 109081]
          [dht-common.c:4202:dht_setxattr] 0-ctvvols-dht: fixing the
          layout of
/31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
        [2017-05-23 14:45:12.647034] I
          [dht-rebalance.c:2652:gf_defrag_process_dir] 0-ctvvols-dht:
          migrate data called on
/31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
        [2017-05-23 14:45:12.647589] I
          [dht-rebalance.c:2866:gf_defrag_process_dir] 0-ctvvols-dht:
          Migration operation on dir
/31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
          took 0.00 secs
        [2017-05-23 14:45:12.653291] I
          [dht-rebalance.c:3838:gf_defrag_start_crawl] 0-DHT: crawling
          file-system completed
        [2017-05-23 14:45:12.653323] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 23
        [2017-05-23 14:45:12.653508] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 24
        [2017-05-23 14:45:12.653536] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 25
        [2017-05-23 14:45:12.653556] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 26
        [2017-05-23 14:45:12.653580] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 27
        [2017-05-23 14:45:12.653603] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 28
        [2017-05-23 14:45:12.653623] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 29
        [2017-05-23 14:45:12.653638] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 30
        [2017-05-23 14:45:12.653659] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 31
        [2017-05-23 14:45:12.653677] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 32
        [2017-05-23 14:45:12.653692] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 33
        [2017-05-23 14:45:12.653711] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 34
        [2017-05-23 14:45:12.653723] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 35
        [2017-05-23 14:45:12.653739] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 36
        [2017-05-23 14:45:12.653759] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 37
        [2017-05-23 14:45:12.653772] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 38
        [2017-05-23 14:45:12.653789] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 39
        [2017-05-23 14:45:12.653800] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 40
        [2017-05-23 14:45:12.653811] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 41
        [2017-05-23 14:45:12.653822] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 42
        [2017-05-23 14:45:12.653836] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 43
        [2017-05-23 14:45:12.653870] I
          [dht-rebalance.c:2246:gf_defrag_task] 0-DHT: Thread wokeup.
          defrag->current_thread_count: 44
        [2017-05-23 14:45:12.654413] I [MSGID: 109028]
          [dht-rebalance.c:4079:gf_defrag_status_get] 0-ctvvols-dht:
          Rebalance is completed. Time taken is 0.00 secs
        [2017-05-23 14:45:12.654428] I [MSGID: 109028]
          [dht-rebalance.c:4083:gf_defrag_status_get] 0-ctvvols-dht:
          Files migrated: 0, size: 0, lookups: 15, failures: 0, skipped:
          0
        [2017-05-23 14:45:12.654552] W
          [glusterfsd.c:1327:cleanup_and_exit]
          (-->/lib64/libpthread.so.0(+0x7dc5) [0x7ff40ff88dc5]
          -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
          [0x7ff41161acd5]
          -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
          [0x7ff41161ab4b] ) 0-: received signum (15), shutting down

        Appreciate your help

          -- 

            Respectfully

                Mahdi A. Mahdi

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users