Any updates on this one?
On Mon, Feb 5, 2018 at 8:18 AM, Tom Fite <tomfite@xxxxxxxxx> wrote:
Hi all,I have seen this issue as well, on Gluster 3.12.1. (3 bricks per box, 2 boxes, distributed-replicate) My testing shows the same thing -- running a find on a directory dramatically increases lstat performance. To add another clue, the performance degrades again after issuing a call to reset the system's cache of dentries and inodes:# sync; echo 2 > /proc/sys/vm/drop_cachesI think that this shows that it's the system cache that's actually doing the heavy lifting here. There are a couple of sysctl tunables that I've found helps out with this.See here:Contrary to what that doc says, I've found that setting vm.vfs_cache_pressure to a low value increases performance by allowing more dentries and inodes to be retained in the cache.# Set the swappiness to avoid swap when possible.vm.swappiness = 10# Set the cache pressure to prefer inode and dentry cache over file cache. This is done to keep as many# dentries and inodes in cache as possible, which dramatically improves gluster small file performance.vm.vfs_cache_pressure = 25For comparison, my config is:Volume Name: gv0Type: TierVolume ID: d490a9ec-f9c8-4f10-a7f3-e1b6d3ced196 Status: StartedSnapshot Count: 13Number of Bricks: 8Transport-type: tcpHot Tier :Hot Tier Type : ReplicateNumber of Bricks: 1 x 2 = 2Brick1: gluster2:/data/hot_tier/gv0Brick2: gluster1:/data/hot_tier/gv0Cold Tier:Cold Tier Type : Distributed-ReplicateNumber of Bricks: 3 x 2 = 6Brick3: gluster1:/data/brick1/gv0Brick4: gluster2:/data/brick1/gv0Brick5: gluster1:/data/brick2/gv0Brick6: gluster2:/data/brick2/gv0Brick7: gluster1:/data/brick3/gv0Brick8: gluster2:/data/brick3/gv0Options Reconfigured:performance.cache-max-file-size: 128MB cluster.readdir-optimize: oncluster.watermark-hi: 95features.ctr-sql-db-cachesize: 262144cluster.read-freq-threshold: 5cluster.write-freq-threshold: 2features.record-counters: oncluster.tier-promote-frequency: 15000 cluster.tier-pause: offcluster.tier-compact: oncluster.tier-mode: cachefeatures.ctr-enabled: onperformance.cache-refresh-timeout: 60 performance.stat-prefetch: onserver.outstanding-rpc-limit: 2056cluster.lookup-optimize: onperformance.client-io-threads: offnfs.disable: ontransport.address-family: inetfeatures.barrier: disableclient.event-threads: 4server.event-threads: 4performance.cache-size: 1GBnetwork.inode-lru-limit: 90000performance.md-cache-timeout: 600performance.cache-invalidation: on features.cache-invalidation-timeout: 600 features.cache-invalidation: onperformance.quick-read: onperformance.io-cache: onperformance.nfs.write-behind-window-size: 4MB performance.write-behind-window-size: 4MB performance.nfs.io-threads: offnetwork.tcp-window-size: 1048576performance.rda-cache-limit: 64MBperformance.flush-behind: onserver.allow-insecure: oncluster.tier-demote-frequency: 18000cluster.tier-max-files: 1000000cluster.tier-max-promote-file-size: 10485760 cluster.tier-max-mb: 64000features.ctr-sql-db-wal-autocheckpoint: 2500 cluster.tier-hot-compact-frequency: 86400 cluster.tier-cold-compact-frequency: 86400 performance.readdir-ahead: offcluster.watermark-low: 50storage.build-pgfid: onperformance.rda-request-size: 128KBperformance.rda-low-wmark: 4KBcluster.min-free-disk: 5%auto-delete: enableOn Sun, Feb 4, 2018 at 9:44 PM, Amar Tumballi <atumball@xxxxxxxxxx> wrote:Thanks for the report Artem,Looks like the issue is about cache warming up. Specially, I suspect rsync doing a 'readdir(), stat(), file operations' loop, where as when a find or ls is issued, we get 'readdirp()' request, which contains the stat information along with entries, which also makes sure cache is up-to-date (at md-cache layer).Note that this is just a off-the memory hypothesis, We surely need to analyse and debug more thoroughly for a proper explanation. Some one in my team would look at it soon.Regards,Amar--On Mon, Feb 5, 2018 at 7:25 AM, Vlad Kopylov <vladkopy@xxxxxxxxx> wrote:You mounting it to the local bricks?
struggling with same performance issues
try using this volume setting
http://lists.gluster.org/pipermail/gluster-users/2018-Januar y/033397.html
performance.stat-prefetch: on might be it
seems like when it gets to cache it is fast - those stat fetch which
seem to come from .gluster are slow
> ______________________________
On Sun, Feb 4, 2018 at 3:45 AM, Artem Russakovskii <archon810@xxxxxxxxx> wrote:
> An update, and a very interesting one!
>
> After I started stracing rsync, all I could see was lstat calls, quite slow
> ones, over and over, which is expected.
>
> For example: lstat("uploads/2016/10/nexus2cee_DSC05339_thumb-161x107.jpg" ,
> {st_mode=S_IFREG|0664, st_size=4043, ...}) = 0
>
> I googled around and found
> https://gist.github.com/nh2/1836415489e2132cf85ed3832105fcc1 , which is
> seeing this exact issue with gluster, rsync and xfs.
>
> Here's the craziest finding so far. If while rsync is running (or right
> before), I run /bin/ls or find on the same gluster dirs, it immediately
> speeds up rsync by a factor of 100 or maybe even 1000. It's absolutely
> insane.
>
> I'm stracing the rsync run, and the slow lstat calls flood in at an
> incredible speed as soon as ls or find run. Several hundred of files per
> minute (excruciatingly slow) becomes thousands or even tens of thousands of
> files a second.
>
> What do you make of this?
>
>
_________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users
Amar Tumballi (amarts)
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users