Hello, I am using a GlusterFS disperse volume to host QEMU images. Previously, I had used a distribute-replicate volume, but the disperse volume seems like it would be a better fit for us. I have created a volume with 11 bricks (3 redundancy). During testing, we’ve encountered ongoing problems. Mainly, there appear to be glusterfs hangs that seem severe. We get many log messages like the following: INFO: task glusterfs:4359 blocked for more than 120 seconds. Tainted: P --------------- 2.6.32-37-pve #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. glusterfs D ffff8803130d8040 0 4359 1 0 0x00000000 ffff88031311db98 0000000000000086 0000000000000000 ffffffff00000000 ffff88033fc0ad00 ffff8803130d8040 ffff88033e6eaf80 ffff88002820ffb0 00016cebb4c94040 0000000000000006 0000000117e5c6b9 000000000000091c Call Trace: [<ffffffff81139590>] ? sync_page+0x0/0x50 [<ffffffff815616c3>] io_schedule+0x73/0xc0 [<ffffffff811395cb>] sync_page+0x3b/0x50 [<ffffffff8156249b>] __wait_on_bit_lock+0x5b/0xc0 [<ffffffff81139567>] __lock_page+0x67/0x70 [<ffffffff810a6910>] ? wake_bit_function+0x0/0x50 [<ffffffff8115323b>] invalidate_inode_pages2_range+0x11b/0x380 [<ffffffffa016da80>] ? fuse_inode_eq+0x0/0x20 [fuse] [<ffffffff811ccb54>] ? ifind+0x74/0xd0 [<ffffffffa016fa10>] fuse_reverse_inval_inode+0x70/0xa0 [fuse] [<ffffffffa01629ae>] fuse_dev_do_write+0x50e/0x6d0 [fuse] [<ffffffff811ad81e>] ? do_sync_read+0xfe/0x140 [<ffffffffa0162ed9>] fuse_dev_write+0x69/0x80 [fuse] [<ffffffff811ad6cc>] do_sync_write+0xec/0x140 [<ffffffff811adf01>] vfs_write+0xa1/0x190 [<ffffffff811ae25a>] sys_write+0x4a/0x90 [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b This plays havoc on the virtual machines. In addition to this, read-write performance would bog down more quickly than would be expected, even under light load. The bricks are distributed among 4 servers connected by bonded gigabit ethernet (LACP). For our application, the slow downs are not a major problem, but they are an irritation. I have been trying different iterations of volume options to try and address this, and happened to find an option that seems to have resolved both issues. On a whim, I disabled performance.io-cache . Client access to the volume seems to be close to wire speed now, at least for large file read-writes. Reading the documentation, it seems like performance.io-cache would not be of huge benefit to our workload, but it seems strange that it would cause all of the various issues we have been having. Is this expected behavior for disperse volumes? We had planned to transition another volume to the disperse configuration, and I’d like to have a good handle on what options are good/bad. BTW: The options selected are really just based upon trial and error, with some not-very-rigorous testing. My volume info is below: Volume Name: oort Type: Disperse Volume ID: 9b8702b2-3901-4cdf-b839-b17a06017f66 Status: Started Number of Bricks: 1 x (8 + 3) = 11 Transport-type: tcp Bricks: Brick1: XXXXX:/export/oort-brick-1/brick Brick2: XXXXX:/export/oort-brick-2/brick Brick3: XXXXX:/export/oort-brick-3/brick Brick4: XXXXX:/export/oort-brick-4/brick Brick5: XXXXX:/export/oort-brick-5/brick Brick6: XXXXX:/export/oort-brick-6/brick Brick7: XXXXX:/export/oort-brick-7/brick Brick8: XXXXX:/export/oort-brick-8/brick Brick9: XXXXX:/export/oort-brick-9/brick Brick10: XXXXX:/export/oort-brick-10/brick Brick11: XXXXX:/export/oort-brick-11/brick Options Reconfigured: transport.keepalive: on server.allow-insecure: on cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: on cluster.eager-lock: on cluster.readdir-optimize: on features.lock-heal: on performance.stat-prefetch: on performance.cache-size: 128MB performance.io-thread-count: 64 performance.read-ahead: off performance.write-behind: on performance.io-cache: off performance.quick-read: off performance.flush-behind: on performance.write-behind-window-size: 2MB Thank you for any help you can provide. Regards, Sherwin |
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users