Re: heal info OK but statistics not working

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 09/04/2017 07:35 PM, Atin Mukherjee wrote:
Ravi/Karthick,

If one of the self heal process is down, will the statstics heal-count command work?

No it doesn't seem to: glusterd stage-op phase fails because shd was down on that node and we error out.
FWIW, the error message "Gathering crawl statistics on volume GROUP-WORK has been unsuccessful on bricks that are down. Please check if all brick processes are running." is incorrect and once https://review.gluster.org/#/c/15724/ gets merged, you will get the correct error message like so:

root@vm2 glusterfs]# gluster v heal testvol statistics
Gathering crawl statistics on volume testvol has been unsuccessful:
 Staging failed on vm1. Error: Self-heal daemon is not running. Check self-heal daemon log file.


-Ravi

On Mon, Sep 4, 2017 at 7:24 PM, lejeczek <peljasz@xxxxxxxxxxx> wrote:
1) one peer, out of four, got separated from the network, from the rest of the cluster.
2) that unavailable(while it was unavailable) peer got detached with "gluster peer detach" command which succeeded, so now cluster comprise of three peers
3) Self-heal daemon (for some reason) does not start(with an attempt to restart glusted) on the peer which probed that fourth peer.
4) fourth unavailable peer is still up & running but is inaccessible to other peers for network is disconnected, segmented. That peer's gluster status show peer is still in the cluster.
5) So, fourth peer's gluster(nor other processes) stack did not fail nor crushed, just network got, is disconnected.
6) peer status show ok & connected for current three peers.

This is third time when it happens to me, very same way: each time net-disjointed peer was brought back online then statistics & details worked again.

can you not reproduce it?

$ gluster vol info QEMU-VMs

Volume Name: QEMU-VMs
Type: Replicate
Volume ID: 8709782a-daa5-4434-a816-c4e0aef8fef2
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.5.6.32:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-QEMU-VMs
Brick2: 10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-QEMU-VMs
Brick3: 10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-QEMU-VMs
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
storage.owner-gid: 107
storage.owner-uid: 107
performance.readdir-ahead: on
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on

$ gluster vol status QEMU-VMs
Status of volume: QEMU-VMs
Gluster process                             TCP Port  RDMA Port Online  Pid
------------------------------------------------------------------------------
Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUS
TERs/0GLUSTER-QEMU-VMs                      49156     0 Y       9302
Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUS
TERs/0GLUSTER-QEMU-VMs                      49156     0 Y       7610
Brick 10.5.6.100:/__.aLocalStorages/0/0-GLU
STERs/0GLUSTER-QEMU-VMs                     49156     0 Y       11013
Self-heal Daemon on localhost               N/A       N/A Y       3069276
Self-heal Daemon on 10.5.6.32               N/A       N/A Y       3315870
Self-heal Daemon on 10.5.6.49               N/A       N/A N       N/A  <--- HERE
Self-heal Daemon on 10.5.6.17               N/A       N/A Y       5163

Task Status of Volume QEMU-VMs
------------------------------------------------------------------------------
There are no active volume tasks

$ gluster vol heal QEMU-VMs statistics heal-count
Gathering count of entries to be healed on volume QEMU-VMs has been unsuccessful on bricks that are down. Please check if all brick processes are running.



On 04/09/17 11:47, Atin Mukherjee wrote:
Please provide the output of gluster volume info, gluster volume status and gluster peer status.

On Mon, Sep 4, 2017 at 4:07 PM, lejeczek <peljasz@xxxxxxxxxxx <mailto:peljasz@xxxxxxxxxxx>> wrote:

    hi all

    this:
    $ vol heal $_vol info
    outputs ok and exit code is 0
    But if I want to see statistics:
    $ gluster vol heal $_vol statistics
    Gathering crawl statistics on volume GROUP-WORK has
    been unsuccessful on bricks that are down. Please
    check if all brick processes are running.

    I suspect - gluster inability to cope with a situation
    where one peer(which is not even a brick for a single
    vol on the cluster!) is inaccessible to the rest of
    cluster.
    I have not played with any other variations of this
    case, eg. more than one peer goes down, etc.
    But I hope someone could try to replicate this simple
    test case.

    Cluster and vols, when something like this happens,
    seem accessible and as such "all" works, except when
    you want more details.
    This also fails:
    $ gluster vol status $_vol detail
    Error : Request timed out

    My gluster(3.10.5-1.el7.x86_64) exhibits these
    symptoms every time one(at least) peers goes out of
    the rest reach.

    maybe @devel can comment?

    many thanks, L.
    _______________________________________________
    Gluster-users mailing list
    Gluster-users@xxxxxxxxxxx
    <mailto:Gluster-users@gluster.org>
    http://lists.gluster.org/mailman/listinfo/gluster-users
    <http://lists.gluster.org/mailman/listinfo/gluster-users>






_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux