I should add that additional testing has shown that only accessing files is held up, IO is not interrupted for existing transfers. I think this points to the heat metadata in the sqlite DB for the tier, is it possible that a table is temporarily locked while the promotion daemon runs so the calls to update the access count on files are blocked?
On Wed, Jan 10, 2018 at 10:17 AM, Tom Fite <tomfite@xxxxxxxxx> wrote:
The sizes of the files are extremely varied, there are millions of small (<1 MB) files and thousands of files larger than 1 GB.Attached is the tier log for gluster1 and gluster2. These are full of "demotion failed" messages, which is also shown in the status:[root@pod-sjc1-gluster1 gv0]# gluster volume tier gv0 statusNode Promoted files Demoted files Status run time in h:m:s--------- --------- --------- --------- ---------localhost 25940 0 in progress 112:21:49pod-sjc1-gluster2 0 2917154 in progress 112:21:49Is it normal to have promotions and demotions only happen on each server but not both?Volume info:[root@pod-sjc1-gluster1 ~]# gluster volume infoVolume Name: gv0Type: Distributed-ReplicateVolume ID: d490a9ec-f9c8-4f10-a7f3-e1b6d3ced196 Status: StartedSnapshot Count: 13Number of Bricks: 3 x 2 = 6Transport-type: tcpBricks:Brick1: pod-sjc1-gluster1:/data/brick1/gv0 Brick2: pod-sjc1-gluster2:/data/brick1/gv0 Brick3: pod-sjc1-gluster1:/data/brick2/gv0 Brick4: pod-sjc1-gluster2:/data/brick2/gv0 Brick5: pod-sjc1-gluster1:/data/brick3/gv0 Brick6: pod-sjc1-gluster2:/data/brick3/gv0 Options Reconfigured:performance.cache-refresh-timeout: 60 performance.stat-prefetch: onserver.allow-insecure: onperformance.flush-behind: onperformance.rda-cache-limit: 32MBnetwork.tcp-window-size: 1048576performance.nfs.io-threads: onperformance.write-behind-window-size: 4MB performance.nfs.write-behind-window-size: 512MB performance.io-cache: onperformance.quick-read: onfeatures.cache-invalidation: onfeatures.cache-invalidation-timeout: 600 performance.cache-invalidation: on performance.md-cache-timeout: 600network.inode-lru-limit: 90000performance.cache-size: 4GBserver.event-threads: 16client.event-threads: 16features.barrier: disabletransport.address-family: inetnfs.disable: onperformance.client-io-threads: oncluster.lookup-optimize: onserver.outstanding-rpc-limit: 1024auto-delete: enable# gluster volume statusStatus of volume: gv0Gluster process TCP Port RDMA Port Online Pid------------------------------------------------------------ ------------------ Hot Bricks:Brick pod-sjc1-gluster2:/data/hot_tier/gv0 49219 0 Y 26714Brick pod-sjc1-gluster1:/data/hot_tier/gv0 49199 0 Y 21325Cold Bricks:Brick pod-sjc1-gluster1:/data/brick1/gv0 49152 0 Y 3178Brick pod-sjc1-gluster2:/data/brick1/gv0 49152 0 Y 4818Brick pod-sjc1-gluster1:/data/brick2/gv0 49153 0 Y 3186Brick pod-sjc1-gluster2:/data/brick2/gv0 49153 0 Y 4829Brick pod-sjc1-gluster1:/data/brick3/gv0 49154 0 Y 3194Brick pod-sjc1-gluster2:/data/brick3/gv0 49154 0 Y 4840Tier Daemon on localhost N/A N/A Y 20313Self-heal Daemon on localhost N/A N/A Y 32023Tier Daemon on pod-sjc1-gluster1 N/A N/A Y 24758Self-heal Daemon on pod-sjc1-gluster2 N/A N/A Y 12349Task Status of Volume gv0------------------------------------------------------------ ------------------ There are no active volume tasksOn Tue, Jan 9, 2018 at 10:33 PM, Hari Gowtham <hgowtham@xxxxxxxxxx> wrote:Hi,
Can you send the volume info, and volume status output and the tier logs.
And I need to know the size of the files that are being stored.
> ______________________________
On Tue, Jan 9, 2018 at 9:51 PM, Tom Fite <tomfite@xxxxxxxxx> wrote:
> I've recently enabled an SSD backed 2 TB hot tier on my 150 TB 2 server / 3
> bricks per server distributed replicated volume.
>
> I'm seeing IO get blocked across all client FUSE threads for 10 to 15
> seconds while the promotion daemon runs. I see the 'glustertierpro' thread
> jump to 99% CPU usage on both boxes when these delays occur and they happen
> every 25 minutes (my tier-promote-frequency setting).
>
> I suspect this has something to do with the heat database in sqlite, maybe
> something is getting locked while it runs the query to determine files to
> promote. My volume contains approximately 18 million files.
>
> Has anybody else seen this? I suspect that these delays will get worse as I
> add more files to my volume which will cause significant problems.
>
> Here are my hot tier settings:
>
> # gluster volume get gv0 all | grep tier
> cluster.tier-pause off
> cluster.tier-promote-frequency 1500
> cluster.tier-demote-frequency 3600
> cluster.tier-mode cache
> cluster.tier-max-promote-file-size 10485760
> cluster.tier-max-mb 64000
> cluster.tier-max-files 100000
> cluster.tier-query-limit 100
> cluster.tier-compact on
> cluster.tier-hot-compact-frequency 86400
> cluster.tier-cold-compact-frequency 86400
>
> # gluster volume get gv0 all | grep threshold
> cluster.write-freq-threshold 2
> cluster.read-freq-threshold 5
>
> # gluster volume get gv0 all | grep watermark
> cluster.watermark-hi 92
> cluster.watermark-low 75
>
_________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://lists.gluster.org/mailman/listinfo/gluster-users
--
Regards,
Hari Gowtham.
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users