Re: VM going down

Alessandro Briosi <ab1@xxxxxxxxxxx> · Tue, 9 May 2017 09:29:52 +0200



    Il 08/05/2017 15:49, Alessandro Briosi
      ha scritto:

    
      Il 08/05/2017 12:57, Jesper Led
        Lauridsen TS Infra server ha scritto:

      
        I dont know if this has any relation to you
            issue. But I have seen several times during gluster healing
            that my wm’s fail or are marked unresponsive in rhev. My
            conclusion is that the load gluster puts on the wm-images
            during checksum while healing, result in to much latency and
            wm’s fail.
         
        My plans is to try using sharding, so the
            wm-images/files are split into smaller files, changing the
            number of allowed concurrent heals
            ‘cluster.background-self-heal-count’ and disabling
            ‘cluster.self-heal-daemon’.
        
      
      The thing is that there are no heal processes running, no log
      entries either.

      Few days ago I had a failure and the heal process started and
      finished without any problems.

      
      I do not use sharding yet.

    
    Well, it happened again on a different volume and a different VM.

    
    This time a self heal process was started.

    
    Why is this happening? there are no network problems on the hosts
    and they all do have bonded 2x1Gbit nics dedicated to gluster...

    
    Is there any information I can give you to find out what happened?

    
    This is the only mention about heal in the logs:

    [2017-05-08 17:34:40.474774] I [MSGID: 108026]
    [afr-self-heal-common.c:1254:afr_log_selfheal]
    0-datastore1-replicate-0: Completed data selfheal on
    bc8f6a7e-31e5-4b48-946c-f779a4b2e64f. sources=[1]  sinks=0 2

    
    The VM went down 1 1/2 hour before:

    [2017-05-08 15:54:11.781749] I [MSGID: 115036]
    [server.c:548:server_rpc_notify] 0-datastore1-server: disconnecting
    connection from
    srvpve1-59243-2017/05/04-00:51:23:914158-datastore1-client-0-0-0

    [2017-05-08 15:54:11.781749] I [MSGID: 115036]
    [server.c:548:server_rpc_notify] 0-datastore1-server: disconnecting
    connection from
    srvpve1-59243-2017/05/04-00:51:23:790337-datastore1-client-0-0-0

    [2017-05-08 15:54:11.781840] I [MSGID: 115013]
    [server-helpers.c:293:do_fd_cleanup] 0-datastore1-server: fd cleanup
    on /images/101/vm-101-disk-2.qcow2

    [2017-05-08 15:54:11.781838] W
    [inodelk.c:399:pl_inodelk_log_cleanup] 0-datastore1-server:
    releasing lock on bc8f6a7e-31e5-4b48-946c-f779a4b2e64f held by
    {client=0x7ffa7c0051f0, pid=0 lk-owner=5c600023827f0

    000}

    [2017-05-08 15:54:11.781863] I [MSGID: 115013]
    [server-helpers.c:293:do_fd_cleanup] 0-datastore1-server: fd cleanup
    on /images/101/vm-101-disk-1.qcow2

    [2017-05-08 15:54:11.781947] I [MSGID: 101055]
    [client_t.c:415:gf_client_unref] 0-datastore1-server: Shutting down
    connection
    srvpve1-59243-2017/05/04-00:51:23:914158-datastore1-client-0-0-0

    [2017-05-08 15:54:11.781971] I [MSGID: 101055]
    [client_t.c:415:gf_client_unref] 0-datastore1-server: Shutting down
    connection
    srvpve1-59243-2017/05/04-00:51:23:790337-datastore1-client-0-0-0

    
    Any hint would be greatly apreciated.

    
    Alessandro

    
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users