Re: Blocking IO when hot tier promotion daemon runs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I should add that additional testing has shown that only accessing files is held up, IO is not interrupted for existing transfers. I think this points to the heat metadata in the sqlite DB for the tier, is it possible that a table is temporarily locked while the promotion daemon runs so the calls to update the access count on files are blocked?


On Wed, Jan 10, 2018 at 10:17 AM, Tom Fite <tomfite@xxxxxxxxx> wrote:
The sizes of the files are extremely varied, there are millions of small (<1 MB) files and thousands of files larger than 1 GB.

Attached is the tier log for gluster1 and gluster2. These are full of "demotion failed" messages, which is also shown in the status:

[root@pod-sjc1-gluster1 gv0]# gluster volume tier gv0 status
Node                 Promoted files       Demoted files        Status               run time in h:m:s   
---------            ---------            ---------            ---------            ---------           
localhost            25940                0                    in progress          112:21:49
pod-sjc1-gluster2 0                    2917154              in progress          112:21:49

Is it normal to have promotions and demotions only happen on each server but not both?

Volume info:

[root@pod-sjc1-gluster1 ~]# gluster volume info
 
Volume Name: gv0
Type: Distributed-Replicate
Volume ID: d490a9ec-f9c8-4f10-a7f3-e1b6d3ced196
Status: Started
Snapshot Count: 13
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: pod-sjc1-gluster1:/data/brick1/gv0
Brick2: pod-sjc1-gluster2:/data/brick1/gv0
Brick3: pod-sjc1-gluster1:/data/brick2/gv0
Brick4: pod-sjc1-gluster2:/data/brick2/gv0
Brick5: pod-sjc1-gluster1:/data/brick3/gv0
Brick6: pod-sjc1-gluster2:/data/brick3/gv0
Options Reconfigured:
performance.cache-refresh-timeout: 60
performance.stat-prefetch: on
server.allow-insecure: on
performance.flush-behind: on
performance.rda-cache-limit: 32MB
network.tcp-window-size: 1048576
performance.nfs.io-threads: on
performance.write-behind-window-size: 4MB
performance.nfs.write-behind-window-size: 512MB
performance.io-cache: on
performance.quick-read: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 90000
performance.cache-size: 4GB
server.event-threads: 16
client.event-threads: 16
features.barrier: disable
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: on
cluster.lookup-optimize: on
server.outstanding-rpc-limit: 1024
auto-delete: enable


# gluster volume status
Status of volume: gv0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick pod-sjc1-gluster2:/data/
hot_tier/gv0                                49219     0          Y       26714
Brick pod-sjc1-gluster1:/data/
hot_tier/gv0                                49199     0          Y       21325
Cold Bricks:
Brick pod-sjc1-gluster1:/data/
brick1/gv0                                  49152     0          Y       3178 
Brick pod-sjc1-gluster2:/data/
brick1/gv0                                  49152     0          Y       4818 
Brick pod-sjc1-gluster1:/data/
brick2/gv0                                  49153     0          Y       3186 
Brick pod-sjc1-gluster2:/data/
brick2/gv0                                  49153     0          Y       4829 
Brick pod-sjc1-gluster1:/data/
brick3/gv0                                  49154     0          Y       3194 
Brick pod-sjc1-gluster2:/data/
brick3/gv0                                  49154     0          Y       4840 
Tier Daemon on localhost                    N/A       N/A        Y       20313
Self-heal Daemon on localhost               N/A       N/A        Y       32023
Tier Daemon on pod-sjc1-gluster1            N/A       N/A        Y       24758
Self-heal Daemon on pod-sjc1-gluster2       N/A       N/A        Y       12349
 
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks
 

On Tue, Jan 9, 2018 at 10:33 PM, Hari Gowtham <hgowtham@xxxxxxxxxx> wrote:
Hi,

Can you send the volume info, and volume status output and the tier logs.
And I need to know the size of the files that are being stored.

On Tue, Jan 9, 2018 at 9:51 PM, Tom Fite <tomfite@xxxxxxxxx> wrote:
> I've recently enabled an SSD backed 2 TB hot tier on my 150 TB 2 server / 3
> bricks per server distributed replicated volume.
>
> I'm seeing IO get blocked across all client FUSE threads for 10 to 15
> seconds while the promotion daemon runs. I see the 'glustertierpro' thread
> jump to 99% CPU usage on both boxes when these delays occur and they happen
> every 25 minutes (my tier-promote-frequency setting).
>
> I suspect this has something to do with the heat database in sqlite, maybe
> something is getting locked while it runs the query to determine files to
> promote. My volume contains approximately 18 million files.
>
> Has anybody else seen this? I suspect that these delays will get worse as I
> add more files to my volume which will cause significant problems.
>
> Here are my hot tier settings:
>
> # gluster volume get gv0 all | grep tier
> cluster.tier-pause                      off
> cluster.tier-promote-frequency          1500
> cluster.tier-demote-frequency           3600
> cluster.tier-mode                       cache
> cluster.tier-max-promote-file-size      10485760
> cluster.tier-max-mb                     64000
> cluster.tier-max-files                  100000
> cluster.tier-query-limit                100
> cluster.tier-compact                    on
> cluster.tier-hot-compact-frequency      86400
> cluster.tier-cold-compact-frequency     86400
>
> # gluster volume get gv0 all | grep threshold
> cluster.write-freq-threshold            2
> cluster.read-freq-threshold             5
>
> # gluster volume get gv0 all | grep watermark
> cluster.watermark-hi                    92
> cluster.watermark-low                   75
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://lists.gluster.org/mailman/listinfo/gluster-users



--
Regards,
Hari Gowtham.


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux