I've recently enabled an SSD backed 2 TB hot tier on my 150 TB 2 server / 3 bricks per server distributed replicated volume.
I'm seeing IO get blocked across all client FUSE threads for 10 to 15 seconds while the promotion daemon runs. I see the 'glustertierpro' thread jump to 99% CPU usage on both boxes when these delays occur and they happen every 25 minutes (my tier-promote-frequency setting).
I suspect this has something to do with the heat database in sqlite, maybe something is getting locked while it runs the query to determine files to promote. My volume contains approximately 18 million files.
Has anybody else seen this? I suspect that these delays will get worse as I add more files to my volume which will cause significant problems.
Here are my hot tier settings:
# gluster volume get gv0 all | grep tier
cluster.tier-pause off
cluster.tier-promote-frequency 1500
cluster.tier-demote-frequency 3600
cluster.tier-mode cache
cluster.tier-max-promote-file-size 10485760
cluster.tier-max-mb 64000
cluster.tier-max-files 100000
cluster.tier-query-limit 100
cluster.tier-compact on
cluster.tier-hot-compact-frequency 86400
cluster.tier-cold-compact-frequency 86400
# gluster volume get gv0 all | grep threshold
cluster.write-freq-threshold 2
cluster.read-freq-threshold 5
# gluster volume get gv0 all | grep watermark
cluster.watermark-hi 92
cluster.watermark-low 75
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users