SelfHeal/AutoHeal Thread Cap on metadata heavy workload

"Kayra Otaner | BilgiO" <kayra.otaner@xxxxxxxxxx> · Sun, 20 Mar 2016 20:49:49 +0200

Hello there,
We're working on migrating over 100 million small files to Gluster 3.7 with bricks sitting on XFS filesystem. We've started with single node first and optimized NFS and other aspects of GlusterFS to perform best for our workload. When we switched second node on, we started experiencing very heavy CPU utilization, mostly due to SelfHealDaemon (SHD). We've tried turning SHD off and let autoheal take care of replicating data across nodes, yet with some directories having over 2 million files, it proved to be very difficult to control CPU utilization.

Our setup has 6 bricks on each node, 2 node, distributed and replicated set up, NFS mount, lots of small files, server-threads, client-threads and io-threads set to 16. We've tried reducing them to 4 but still seeing same symptoms. I've spend fair amount of time to analyze threat usage using, strace, sysdig and such, no matter what we set for thread configuration SHD seems like using 24 threads (4 for each brick it seems).

Does anyone know how to throttle SHD and autoheal down so that it doesn't consume too much CPU power? Any other ideas on how to tune GlusterFS for small file / metadata-heavy workload is appreciated, especially when adding/removing nodes/bricks.

Thank you

-- 
Kayra Otaner
BilgiO A.Ş. -  SecOps Experts
PGP KeyID : A945251E | Manager, Enterprise Linux Solutions
www.bilgio.com |  TR +90 (532) 111-7240 x 1001 | US +1 (201) 206-2592

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users