When you use gluster volume as VM datastore, it will generate a big file such as *.vmdk according to your VM machine configure. When gluster doing self-heal, I am not sure gluster will replicate the whole file or only part of the file. If the whole file is replicated, it will be a heavy burden to the system. Maybe you can consider other arrangement to use gluster. From: Peter Linder <peter.linder at fiberdirekt.se> Subject: Re: Problem with VM images when one node goes online (self-healing) on a 2 node replication gluster for VMware datastore To: gluster-users at gluster.org Message-ID: <4E9406E0.8000803 at fiberdirekt.se> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" With 3.2.4 during self-heal, no operation on the file being healed is allowed so your VM's will stall and time out if the self-heal isn't finished quick enough. gluster 3.3 will fix this, but I don't know when it will be released. There are betas to try out though :). Perhaps somebody else can say how stable 3.3-beta2 is compared to 3.2.4? On 10/11/2011 10:57 AM, keith wrote: > Hi all > > I am testing gluster-3.2.4 on a 2 nodes storage with replication as > our VMware datastore. > > The setup is running replication on 2 nodes with ucarp and mount it on > WMware using NFS to gluster as a datastore. > >> Volume Name: GLVOL1 >> Type: Replicate >> Status: Started >> Number of Bricks: 2 >> Transport-type: tcp >> Bricks: >> Brick1: t4-01.store:/EXPORT/GLVOL1 >> Brick2: t4-03.store:/EXPORT/GLVOL1 >> Options Reconfigured: >> performance.cache-size: 4096MB > > High-availability testing goes on smoothly without any problem or > data-corruption, that is when any node is down, all VM guests runs > normally without any problem. > > The problem arises when I bring up the failed node and the node start > doing self-healing. All my VM guests get kernel error messages and > finally the VM guests ended up with "EXT3-fs error: > ext3_journal_start_sb: detected aborted journal" remount filesystem > (root) as read-only. > > Below are some of the VM guests kernel error generated when I bring up > the failed gluster node for self-healing: > >> Oct 11 15:57:58 testvm3 kernel: pvscsi: task abort on host 1, >> ffff8100221c90c0 >> Oct 11 15:57:58 testvm3 kernel: pvscsi: task abort on host 1, >> ffff8100221c9240 >> Oct 11 15:57:58 testvm3 kernel: pvscsi: task abort on host 1, >> ffff8100221c93c0 >> Oct 11 15:58:34 testvm3 kernel: INFO: task kjournald:2081 blocked for >> more than 120 seconds. >> Oct 11 15:58:34 testvm3 kernel: "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> Oct 11 15:58:34 testvm3 kernel: kjournald D ffff810001736420 >> 0 2081 14 2494 2060 (L-TLB) >> Oct 11 15:58:34 testvm3 kernel: ffff81003c087cf0 0000000000000046 >> ffff810030ef2288 ffff81003f5d6048 >> Oct 11 15:58:34 testvm3 kernel: 00000000037685c8 000000000000000a >> ffff810037c53820 ffffffff80314b60 >> Oct 11 15:58:34 testvm3 kernel: 00001883cb68d47d 0000000000002c4e >> ffff810037c53a08 000000003f5128b8 >> Oct 11 15:58:34 testvm3 kernel: Call Trace: >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8006ec8f>] >> do_gettimeofday+0x40/0x90 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800155d3>] >> sync_buffer+0x0/0x3f >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800637ce>] >> io_schedule+0x3f/0x67 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8001560e>] >> sync_buffer+0x3b/0x3f >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800639fa>] >> __wait_on_bit+0x40/0x6e >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800155d3>] >> sync_buffer+0x0/0x3f >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80063a94>] >> out_of_line_wait_on_bit+0x6c/0x78 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2e2b>] >> wake_bit_function+0x0/0x23 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff88033a41>] >> :jbd:journal_commit_transaction+0x553/0x10aa >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8003d85b>] >> lock_timer_base+0x1b/0x3c >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8004ad98>] >> try_to_del_timer_sync+0x7f/0x88 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff88037662>] >> :jbd:kjournald+0xc1/0x213 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2dfd>] >> autoremove_wake_function+0x0/0x2e >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2be5>] >> keventd_create_kthread+0x0/0xc4 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff880375a1>] >> :jbd:kjournald+0x0/0x213 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2be5>] >> keventd_create_kthread+0x0/0xc4 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80032722>] kthread+0xfe/0x132 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2be5>] >> keventd_create_kthread+0x0/0xc4 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80032624>] kthread+0x0/0x132 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 >> Oct 11 15:58:34 testvm3 kernel: >> Oct 11 15:58:34 testvm3 kernel: INFO: task crond:3418 blocked for >> more than 120 seconds. >> Oct 11 15:58:34 testvm3 kernel: "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> Oct 11 15:58:34 testvm3 kernel: crond D ffff810001736420 >> 0 3418 1 3436 3405 (NOTLB) >> Oct 11 15:58:34 testvm3 kernel: ffff810036c55ca8 0000000000000086 >> 0000000000000000 ffffffff80019e3e >> Oct 11 15:58:34 testvm3 kernel: 0000000000065bf2 0000000000000007 >> ffff81003ce4b080 ffffffff80314b60 >> Oct 11 15:58:34 testvm3 kernel: 000018899ae16270 0000000000023110 >> ffff81003ce4b268 000000008804ec00 >> Oct 11 15:58:34 testvm3 kernel: Call Trace: >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80019e3e>] __getblk+0x25/0x22c >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8006ec8f>] >> do_gettimeofday+0x40/0x90 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800155d3>] >> sync_buffer+0x0/0x3f >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800637ce>] >> io_schedule+0x3f/0x67 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8001560e>] >> sync_buffer+0x3b/0x3f >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80063912>] >> __wait_on_bit_lock+0x36/0x66 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800155d3>] >> sync_buffer+0x0/0x3f >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800639ae>] >> out_of_line_wait_on_bit_lock+0x6c/0x78 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2e2b>] >> wake_bit_function+0x0/0x23 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8803181e>] >> :jbd:do_get_write_access+0x54/0x522 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80019e3e>] __getblk+0x25/0x22c >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff88031d0e>] >> :jbd:journal_get_write_access+0x22/0x33 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8804dd37>] >> :ext3:ext3_reserve_inode_write+0x38/0x90 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8804ddb0>] >> :ext3:ext3_mark_inode_dirty+0x21/0x3c >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff88050d35>] >> :ext3:ext3_dirty_inode+0x63/0x7b >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80013d98>] >> __mark_inode_dirty+0x29/0x16e >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80025a49>] filldir+0x0/0xb7 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8003516b>] >> vfs_readdir+0x8c/0xa9 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800389db>] >> sys_getdents+0x75/0xbd >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8005d229>] tracesys+0x71/0xe0 >> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0 >> Oct 11 15:58:34 testvm3 kernel: >> Oct 11 15:58:34 testvm3 kernel: INFO: task httpd:3452 blocked for >> more than 120 seconds. >> Oct 11 15:58:34 testvm3 kernel: "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> Oct 11 15:58:34 testvm3 kernel: httpd D ffff810001736420 >> 0 3452 3405 3453 (NOTLB) >> Oct 11 15:58:34 testvm3 kernel: ffff810035ea9dc8 0000000000000086 >> 0000000000000000 ffffffff80009a1c >> Oct 11 15:58:34 testvm3 kernel: ffff810035ea9e28 0000000000000009 >> ffff810037e52080 ffffffff80314b60 >> Oct 11 15:58:34 testvm3 kernel: 000018839f75405c 000000000003363d >> ffff810037e52268 000000003f5e7150 > > Please note that although I am using ucarp for IP failover and by > default ucarp will alway have a preferred master, I have added codes > to make sure that the ucarp master will always become slave when it > goes down and come up again. This will ensure that WMware will not > connect back to the failed node when it comes back up. > > However this does not prevent the problem I describe above. > > There are a lot of logs generated during self-healing process. It > doesn't make any sense to me. I am attaching it. It's over 900k. So I > zip them up. Hopefully the mailling list allow attachment. > > Is there any best practices to setup/run gluster with replication as a > datastore to VMware that make sure VM guests run smoothly even when > one node goes into self-healing? > > Any advise is appreciated. > > Keith > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users