Re: Very poor heal behaviour in 3.7.9

Krutika Dhananjay <kdhananj@xxxxxxxxxx> · Fri, 25 Mar 2016 18:03:47 +0530

Hi,

There is one bug that was uncovered recently wherein the same file could possibly get healed twice before marking that it no longer needs a heal.
Pranith sent a patch @ http://review.gluster.org/#/c/13766/ to fix this, although IIUC this bug existed in versions < 3.7.9 as well.
Also because of this bug, files that need heal may appear in heal-info output longer than they ought to.
Did you see this issue in versions < 3.7.9 as well?

-Krutika

On Fri, Mar 25, 2016 at 1:04 PM, Lindsay Mathieson <lindsay.mathieson@xxxxxxxxx> wrote:

    Have resumed testing with 3.7.9 - this time I have propery hardware
    behind it, 

    - 3 nodes

    - each node with 4 WD Reds in ZFS raid 10 

    - SSD for slog and cache.

    Using a sharded VM setup (4MB shards) and performance has been
    excellent, better than ceph on the same hardware. I have some
    interesting notes on that I will detail later.

    However unlike with 3.7.7, heal performance has been abysmal - deal
    breaking in fact. Maybe its my setup?

    Have been testing healing by killing  the glusterfsd and glusterd
    processes on another node and letting a VM run. Everything is fun at
    this point, despite a node being down, reads and writes continue
    normally.

    However a heal info shows what appears to be an excessive number of
    shards being marked as needing heals. A simple reboot of a Windows
    VM results in 360 4MB shards - 1.5GB of data. A compile resulted in
    7GB of shards being touched. Could there be some write amplification
    at work?

    However once I restart the glusterd process, which starts glisterfsd
    performance becomes atrocious. Disk IO nearly stops and any running
    VM's hang or slow down and *lot* until the heal is complete. The
    "heal info" command appears to hang as well, not comppleting at all.
    A build process that was taking 4 min's took over an hour.

    Once the heal finishes, I/O returns to normal.

    Heres a fragment of the glfsheal log

    [2016-03-25 07:12:51.041590] I [MSGID: 114057]
      [client-handshake.c:1437:select_server_supported_programs]
      0-datastore2-client-2: Using Program GlusterFS 3.3, Num (1298437),
      Version (330)

    [2016-03-25 07:12:51.041637] I
      [rpc-clnt.c:1847:rpc_clnt_reconfig] 0-datastore2-client-1:
      changing port to 49153 (from 0)

    [2016-03-25 07:12:51.041808] I [MSGID: 114046]
      [client-handshake.c:1213:client_setvolume_cbk]
      0-datastore2-client-2: Connected to datastore2-client-2, attached
      to remote volume '/tank/vmdata/datastore2'.

    [2016-03-25 07:12:51.041826] I [MSGID: 114047]
      [client-handshake.c:1224:client_setvolume_cbk]
      0-datastore2-client-2: Server and Client lk-version numbers are
      not same, reopening the fds

    [2016-03-25 07:12:51.041901] I [MSGID: 108005]
      [afr-common.c:4010:afr_notify] 0-datastore2-replicate-0: Subvolume
      'datastore2-client-2' came back up; going online.

    [2016-03-25 07:12:51.041929] I [MSGID: 114057]
      [client-handshake.c:1437:select_server_supported_programs]
      0-datastore2-client-0: Using Program GlusterFS 3.3, Num
        (1298437), Version (330)

    [2016-03-25 07:12:51.041955] I [MSGID: 114035]
      [client-handshake.c:193:client_set_lk_version_cbk]
      0-datastore2-client-2: Server lk version = 1

    [2016-03-25 07:12:51.042319] I [MSGID: 114046]
      [client-handshake.c:1213:client_setvolume_cbk]
      0-datastore2-client-0: Connected to datastore2-client-0, attached
      to remote volume '/tank/vmdata/datastore2'.

    [2016-03-25 07:12:51.042333] I [MSGID: 114047]
      [client-handshake.c:1224:client_setvolume_cbk]
      0-datastore2-client-0: Server and Client lk-version numbers are
      not same, reopening the fds

    [2016-03-25 07:12:51.042455] I [MSGID: 114057]
      [client-handshake.c:1437:select_server_supported_programs]
      0-datastore2-client-1: Using Program GlusterFS 3.3, Num (1298437),
      Version (330)

    [2016-03-25 07:12:51.042520] I [MSGID: 114035]
      [client-handshake.c:193:client_set_lk_version_cbk]
      0-datastore2-client-0: Server lk version = 1

    [2016-03-25 07:12:51.042846] I [MSGID: 114046]
      [client-handshake.c:1213:client_setvolume_cbk]
      0-datastore2-client-1: Connected to datastore2-client-1, attached
      to remote volume '/tank/vmdata/datastore2'.

    [2016-03-25 07:12:51.042867] I [MSGID: 114047]
      [client-handshake.c:1224:client_setvolume_cbk]
      0-datastore2-client-1: Server and Client lk-version numbers are
      not same, reopening the fds

    [2016-03-25 07:12:51.058131] I [MSGID: 114035]
      [client-handshake.c:193:client_set_lk_version_cbk]
      0-datastore2-client-1: Server lk version = 1

    [2016-03-25 07:12:51.059075] I [MSGID: 108031]
      [afr-common.c:1913:afr_local_discovery_cbk]
      0-datastore2-replicate-0: selecting local read_child
      datastore2-client-2

    [2016-03-25 07:12:51.059619] I [MSGID: 104041]
      [glfs-resolve.c:869:__glfs_active_subvol] 0-datastore2: switched
      to graph 766e612d-3739-3437-352d-323031362d30 (0)

    I have no idea while client version 3.3 is being used! everything
    should be 3.7.9

    Environment:

    - Proxmox (debian Jessie, 8.2)

    - KVM VM's using gfapi, running on the same nodes as the gluster
    bricks

    - bricks are hosted on 3 ZFS Pools (one per node)

        * compression =lz4

        * xattr=sa

        * sync=standard

        * acltype=posixacl 

    Volume info:

    Volume Name: datastore2

    Type: Replicate

    Volume ID: 7d93a1c6-ac39-4d94-b136-e8379643bddd

    Status: Started

    Number of Bricks: 1 x 3 = 3

    Transport-type: tcp

    Bricks:

    Brick1: vnb.proxmox.softlog:/tank/vmdata/datastore2

    Brick2: vng.proxmox.softlog:/tank/vmdata/datastore2

    Brick3: vna.proxmox.softlog:/tank/vmdata/datastore2

    Options Reconfigured:

    performance.readdir-ahead: on

    nfs.addr-namelookup: off

    nfs.enable-ino32: off

    features.shard: on

    cluster.quorum-type: auto

    cluster.server-quorum-type: server

    nfs.disable: on

    performance.write-behind: off

    performance.strict-write-ordering: on

    performance.stat-prefetch: off

    performance.quick-read: off

    performance.read-ahead: off

    performance.io-cache: off

    cluster.eager-lock: enable

    network.remote-dio: enable

    I can do any testing required, bring back logs etc. Can't build
    gluster though.

    thanks,

    -- 
Lindsay Mathieson

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users