Re: Very poor heal behaviour in 3.7.9

Lindsay Mathieson <lindsay.mathieson@xxxxxxxxx> · Sat, 26 Mar 2016 11:25:38 +1000

On 26/03/2016 12:14 AM, Ravishankar N wrote:
I think you need the exact no. of files and size of files that need 
healing to make any meaningful comparison of self-heal performance 
across versions.
VM workloads with sharding might not be the ideal 'reproducer' since 
you really don't know how many shards get modified when a replica is 
down and I/O on the VMs happen. I suppose you could try testing the 
heal performance of a specific no. of files on a sharded volume and 
compare results.

Maybe my subject description was poor - while heal progress is not the 
best, its the I/O stalls that *really* concern me. If I reboot a node 
(or it crashes etc) any VM that is running on the cluster when that 
happened freezes on I/O access when heal kicks in until it finishes, 
which will take over an hour.

I see similar behaviour noted in the "GlusterFS cluster stalls if one 
server from the cluster goes down and then comes back up".

I tried setting "cluster.data-self-heal" off as suggested on that thread 
and it seems to have improved things. In the middle of maintenance right 
now and will test it more later.

thanks,

--
Lindsay Mathieson

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users