Re: 3.8.2 : Node not healing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20/08/2016 9:28 PM, Pranith Kumar Karampuri wrote:
Lindsay,
Please do "gluster volume set <volname> data-self-heal-algorithm full" to prevent diff self-heals(checksum computations on the files) which use a lot of CPU if not already.

I'll givbe that a spin and see how it works out - toss up as to which is a bigger resource problem, CPU or bandwidth :)

One more thing that could have lead to lot of CPU is full directory heals on .shard. Krutika recently implemented a feature called granular entry self-heal which should address this issue in future. We have throttling feature coming along in future as well to play nice with rest of the system.

I already have "cluster.granular-entry-heal: on" and "cluster.locking-scheme: granular" set, or are you saying that feature has improvements yet to come?


Anyway, I'm not really looking at cpu hogging (well not much anyway :)), rather I was trying to find why heal were not starting. With my first test I had 25000 shard needing healing and nothing happened for over 3 hours untill I shutdown all vm's on the ndoe and restarted it.


I did the same test yesterday
- killed all gluster processes on a node
- waited to heal-count rose to 1500
- restarted gluster on that node
- nothing happened for 45 minutes (heal-count stayed at 1500).
- I shutdown all VM's on that node
- healing started withint several minutes and completed in under half an hour

Which leads me to wonder if having active local I/O on a gluster node when you crash and restarted the gluster processes (as opposed to rebooting the node) blocks the heals from starting.

If so, not a huge issue for me - typically that will never happen as gluster never actually crashes on me :) The most likely scenario is rolling upgrades or hard reboots.



gluster v info

Volume Name: datastore4
Type: Replicate
Volume ID: 0ba131ef-311d-4bb1-be46-596e83b2f6ce
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: vnb.proxmox.softlog:/tank/vmdata/datastore4
Brick2: vng.proxmox.softlog:/tank/vmdata/datastore4
Brick3: vna.proxmox.softlog:/tank/vmdata/datastore4
Options Reconfigured:
cluster.locking-scheme: granular
cluster.granular-entry-heal: on
performance.readdir-ahead: on
cluster.self-heal-window-size: 1024
cluster.data-self-heal: on
features.shard: on
cluster.quorum-type: auto
cluster.server-quorum-type: server
nfs.disable: on
nfs.addr-namelookup: off
nfs.enable-ino32: off
performance.strict-write-ordering: off
performance.stat-prefetch: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
cluster.eager-lock: enable
network.remote-dio: enable
features.shard-block-size: 64MB
cluster.background-self-heal-count: 16


--
Lindsay Mathieson

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users



[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux