+Raghavendra and Manoj for their insights.
On 6 April 2018 at 13:53, Artem Russakovskii <archon810@xxxxxxxxx> wrote:
I restarted rsync, and this has been sitting there for almost a minute, barely moved several bytes in that time:2014/11/545b06baa3d98/com.google.android.apps. inputmethod.zhuyin-2.1.0. 79226761-armeabi-v7a-175- minAPI14.apk 6,389,760 45% 18.76kB/s 0:06:50I straced each of the 3 processes rsync created and saw this (note: every time there were several seconds of no output, I ctrl-C'ed and detached from strace):citadel:/home/archon810 # strace -p 16776Process 16776 attachedselect(6, [5], [4], [5], {45, 293510}) = 1 (out [4], left {44, 71342})write(4, "\4\200\0\7\3513>\2755\360[\372\317\337DZ\36\324\300o\235\ 377\247\367\177%\37\226\352\ 377\256\351"..., 32776) = 32776 ioctl(1, TIOCGPGRP, [16776]) = 0write(1, "\r 4,292,608 30% 27.07kB/"..., 46) = 46select(6, [5], [4], [5], {60, 0}) = 1 (out [4], left {59, 999998})write(4, "\4\200\0\7\270\224\277\24\31\247f\32\233x\t\276l\f-\254\r\ 246\324\360\30\235\350\6\34\ 304\230\242"..., 32776) = 32776 select(6, [5], [4], [5], {60, 0}) = 1 (out [4], left {59, 999998})write(4, "\4\200\0\7\346_\363\36\33\320}Dd\5_\327\250\237i\242?B\ 276e\245\202Z\213\301[\25S"... , 32776) = 32776 select(6, [5], [4], [5], {60, 0}) = 1 (out [4], left {59, 999998})write(4, "\4\200\0\7\330\303\221\357\225\37h\373\366X\306L\f>\234\\ %n\253\266\5\372c\257>V\366\ 255"..., 32776) = 32776 select(6, [5], [4], [5], {60, 0}) = 1 (out [4], left {59, 999998})write(4, "\4\200\0\7i\301\17u\224{/O\213\330\33\317\272\246\221\22\ 261|w\244\5\307|\21\373\v\ 356k"..., 32776) = 32776 select(6, [5], [4], [5], {60, 0}) = 1 (out [4], left {59, 999998})write(4, "\4\200\0\7\270\277\233\206n\304:\362_\213~\356bm\5\350\ 337\26\203\225\332\277\372\ 275\247<\307\22"..., 32776) = 32776 read(3, "\316\214\260\341:\263P\214\373n\313\10\333 }\323\364Q\353\r\232d\204\257\ \Q\306/\277\253/\356"..., 262144) = 262144 select(6, [5], [4], [5], {60, 0}) = 1 (out [4], left {59, 999998})write(4, "\4\200\0\7\314\233\274S08\330\276\226\267\233\360rp\ 210x)\320\0314\223\323\3335Y\ 312\313\307"..., 32776) = 32776 select(6, [5], [4], [5], {60, 0}) = 1 (out [4], left {59, 999998})write(4, "\4\200\0\7\316\214\260\341:\263P\214\373n\313\10\333 }\323\364Q\353\r\232d\204\257\ \Q\306/"..., 32776) = 32776 select(6, [5], [4], [5], {60, 0}^CProcess 16776 detached<detached ...>citadel:/home/archon810 # strace -p 16777Process 16777 attachedselect(4, [3], [], [3], {38, 210908}^CProcess 16777 detached<detached ...>citadel:/home/archon810 # strace -p 16776Process 16776 attachedselect(6, [5], [4], [5], {48, 295874}^CProcess 16776 detached<detached ...>citadel:/home/archon810 # strace -p 16778Process 16778 attachedselect(1, [0], [], [0], {60, 0}) = 1 (in [0], left {59, 999996})read(0, "\0\200\0\0\4\200\0\7\3508\343\204\207\255\4\212y\230&&\ 372\30*\322\f\325v\335\230 \16v"..., 32768) = 32768 select(1, [0], [], [0], {60, 0}) = 1 (in [0], left {59, 999998})read(0, "\373\30\2\2667\371\207)", 8) = 8select(1, [0], [], [0], {60, 0}) = 1 (in [0], left {59, 999999})read(0, "\0\200\0\0\4\200\0\7\6\213\2223\233\36-\350,\303\0\234\7` \317\276H\353u\217\275\316\ 333@"..., 32768) = 32768 select(1, [0], [], [0], {60, 0}) = 1 (in [0], left {59, 999999})read(0, "\375\33\367_\357\330\362\222", 8) = 8 select(1, [0], [], [0], {60, 0}) = 1 (in [0], left {59, 999999})read(0, "\0\200\0\0\4\200\0\7`Nv\355\275\336wzQ\365\264\364\20AX\ 365DG\372\311\216\212\375\276" ..., 32768) = 32768 select(1, [0], [], [0], {60, 0}) = 1 (in [0], left {59, 999999})read(0, "\374)\300\264}\21\226s", 8) = 8select(1, [0], [], [0], {60, 0}) = 1 (in [0], left {59, 999999})read(0, "\0\200\0\0\4\200\0\7\10:\v\342O\305\374\5:Y+ \250\315\24\202J-@\256WC\320\ 371"..., 32768) = 32768 select(1, [0], [], [0], {60, 0}) = 1 (in [0], left {59, 999999})read(0, "\3023\24O\343y\312\204", 8) = 8select(1, [0], [], [0], {60, 0}) = 1 (in [0], left {59, 999999})read(0, "\0\200\0\0\4\200\0\7\27\22^\n/S.\215\362T\f\257Q\207z\241~ B\3\32\32344\17"..., 32768) = 32768 select(1, [0], [], [0], {60, 0}) = 1 (in [0], left {59, 999998})read(0, "\367P\222\262\224\17\25\250", 8) = 8select(1, [0], [], [0], {60, 0}) = 1 (in [0], left {59, 999999})read(0, "\0\200\0\0\4\200\0\7FujR\213\372\341E\232\360\n\257\323\ 233>\364\245\37\3\31\314\20\ 206\362"..., 32768) = 32768 select(1, [0], [], [0], {60, 0}) = 1 (in [0], left {59, 999999})read(0, "\203o\300\341\37\340(8", 8) = 8select(1, [0], [], [0], {60, 0}) = 1 (in [0], left {59, 999998})read(0, "\0\200\0\0\4\200\0\7n\211\357\301\217\210\23\341$\342d8\ 25N\2035[\260\1\206B\206!\2".. ., 32768) = 32768 select(1, [0], [], [0], {60, 0}) = 1 (in [0], left {59, 999999})read(0, "|\222\223\336\201w\325\356", 8) = 8select(1, [0], [], [0], {60, 0}) = 1 (in [0], left {59, 999999})read(0, "\0\200\0\0\4\200\0\7\220\216Y\343\362\366\231\372?\ 334N^\303\35\374cC;\vtx\231<w" ..., 32768) = 32768 select(1, [0], [], [0], {60, 0}) = 1 (in [0], left {59, 999998})read(0, ";k\v\314\21\375\3\274", 8) = 8write(3, "\3508\343\204\207\255\4\212y\230&&\372\30*\322\f\325v\335\ 230 \16v\213O//\332\4\24\24"..., 262144^C I'm really not sure what to make of this. In the time I wrote the above, the file still hasn't finished copying.2014/11/545b06baa3d98/com.google.android.apps. inputmethod.zhuyin-2.1.0. 79226761-armeabi-v7a-175- minAPI14.apk 10,321,920 73% 33.31kB/s 0:01:53On Fri, Apr 6, 2018 at 1:12 AM, Artem Russakovskii <archon810@xxxxxxxxx> wrote:Hi again,I'd like to expand on the performance issues and plead for help. Here's one case which shows these odd hiccups: https://i.imgur.com/CXBPjTK.gifv .In this GIF where I switch back and forth between copy operations on 2 servers, I'm copying a 10GB dir full of .apk and image files.On server "hive" I'm copying straight from the main disk to an attached volume block (xfs). As you can see, the transfers are relatively speedy and don't hiccup.On server "citadel" I'm copying the same set of data to a 4-replicate gluster which uses block storage as a brick. As you can see, performance is much worse, and there are frequent pauses for many seconds where nothing seems to be happening - just freezes.All 4 servers have the same specs, and all of them have performance issues with gluster and no such issues when raw xfs block storage is used.hive has long finished copying the data, while citadel is barely chugging along and is expected to take probably half an hour to an hour. I have over 1TB of data to migrate, at which point if we went live, I'm not even sure gluster would be able to keep up instead of bringing the machines and services down.Here's the cluster config, though it didn't seem to make any difference performance-wise before I applied the customizations vs after.Volume Name: apkmirror_data1Type: ReplicateVolume ID: 11ecee7e-d4f8-497a-9994-ceb144d6841e Status: StartedSnapshot Count: 0Number of Bricks: 1 x 4 = 4Transport-type: tcpBricks:Brick1: nexus2:/mnt/nexus2_block1/apkmirror_data1 Brick2: forge:/mnt/forge_block1/apkmirror_data1 Brick3: hive:/mnt/hive_block1/apkmirror_data1 Brick4: citadel:/mnt/citadel_block1/apkmirror_data1 Options Reconfigured:cluster.quorum-count: 1cluster.quorum-type: fixednetwork.ping-timeout: 5network.remote-dio: enableperformance.rda-cache-limit: 256MBperformance.readdir-ahead: onperformance.parallel-readdir: onnetwork.inode-lru-limit: 500000performance.md-cache-timeout: 600performance.cache-invalidation: on performance.stat-prefetch: onfeatures.cache-invalidation-timeout: 600 features.cache-invalidation: oncluster.readdir-optimize: onperformance.io-thread-count: 32server.event-threads: 4client.event-threads: 4performance.read-ahead: offcluster.lookup-optimize: onperformance.cache-size: 1GBcluster.self-heal-daemon: enabletransport.address-family: inetnfs.disable: onperformance.client-io-threads: onThe mounts are done as follows in /etc/fstab:/dev/disk/by-id/scsi-0Linode_Volume_citadel_block1 /mnt/citadel_block1 xfs defaults 0 2 localhost:/apkmirror_data1 /mnt/apkmirror_data1 glusterfs defaults,_netdev 0 0I'm really not sure if direct-io-mode mount tweaks would do anything here, what the value should be set to, and what it is by default.The OS is OpenSUSE 42.3, 64-bit. 80GB of RAM, 20 CPUs, hosted by Linode.I'd really appreciate any help in the matter.Thank you.On Thu, Apr 5, 2018 at 11:13 PM, Artem Russakovskii <archon810@xxxxxxxxx> wrote:Hi,I'm trying to squeeze performance out of gluster on 4 80GB RAM 20-CPU machines where Gluster runs on attached block storage (Linode) in (4 replicate bricks), and so far everything I tried results in sub-optimal performance.There are many files - mostly images, several million - and many operations take minutes, copying multiple files (even if they're small) suddenly freezes up for seconds at a time, then continues, iostat frequently shows large r_await and w_awaits with 100% utilization for the attached block device, etc.But anyway, there are many guides out there for small-file performance improvements, but more explanation is needed, and I think more tweaks should be possible.My question today is about performance.cache-size. Is this a size of cache in RAM? If so, how do I view the current cache size to see if it gets full and I should increase its size? Is it advisable to bump it up if I have many tens of gigs of RAM free?More generally, in the last 2 months since I first started working with gluster and set a production system live, I've been feeling frustrated because Gluster has a lot of poorly-documented and confusing options. I really wish documentation could be improved with examples and better explanations.Specifically, it'd be absolutely amazing if the docs offered a strategy for setting each value and ways of determining more optimal values. For example, for performance.cache-size, if it said something like "run command abc to see your current cache size, and if it's hurting, up it, but be aware that it's limited by RAM," it'd be already a huge improvement to the docs. And so on with other options.The gluster team is quite helpful on this mailing list, but in a reactive rather than proactive way. Perhaps it's tunnel vision once you've worked on a project for so long where less technical explanations and even proper documentation of options takes a back seat, but I encourage you to be more proactive about helping us understand and optimize Gluster.Thank you.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users