Problem with VM images when one node goes online

nixiaoke at hotmail.com (nixiaoke) · Wed, 12 Oct 2011 11:40:58 +0800

When you use gluster volume as VM datastore, it will generate a big file 
such as *.vmdk according to your VM machine configure.

When gluster doing self-heal, I am not sure gluster will replicate the whole 
file or only part of the file.

If the whole file is replicated, it will be a heavy burden to the system.

Maybe you can consider other arrangement to use gluster.

From: Peter Linder <peter.linder at fiberdirekt.se>
Subject: Re: Problem with VM images when one node goes
online (self-healing) on a 2 node replication gluster for VMware
datastore
To: gluster-users at gluster.org
Message-ID: <4E9406E0.8000803 at fiberdirekt.se>
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"

With 3.2.4 during self-heal, no operation on the file being healed is
allowed so your VM's will stall and time out if the self-heal isn't
finished quick enough. gluster 3.3 will fix this, but I don't know when
it will be released. There are betas to try out though :). Perhaps
somebody else can say how stable 3.3-beta2 is compared to 3.2.4?

On 10/11/2011 10:57 AM, keith wrote:
> Hi all
>
> I am testing gluster-3.2.4 on a 2 nodes storage with replication as
> our VMware datastore.
>
> The setup is running replication on 2 nodes with ucarp and mount it on
> WMware using NFS to gluster as a datastore.
>
>> Volume Name: GLVOL1
>> Type: Replicate
>> Status: Started
>> Number of Bricks: 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: t4-01.store:/EXPORT/GLVOL1
>> Brick2: t4-03.store:/EXPORT/GLVOL1
>> Options Reconfigured:
>> performance.cache-size: 4096MB
>
> High-availability testing goes on smoothly without any problem or
> data-corruption, that is when any node is down, all VM guests runs
> normally without any problem.
>
> The problem arises when I bring up the failed node and the node start
> doing self-healing.  All my VM guests get kernel error messages and
> finally the VM guests ended up with "EXT3-fs error:
> ext3_journal_start_sb: detected aborted journal" remount filesystem
> (root) as read-only.
>
> Below are some of the VM guests kernel error generated when I bring up
> the failed gluster node for self-healing:
>
>> Oct 11 15:57:58 testvm3 kernel: pvscsi: task abort on host 1,
>> ffff8100221c90c0
>> Oct 11 15:57:58 testvm3 kernel: pvscsi: task abort on host 1,
>> ffff8100221c9240
>> Oct 11 15:57:58 testvm3 kernel: pvscsi: task abort on host 1,
>> ffff8100221c93c0
>> Oct 11 15:58:34 testvm3 kernel: INFO: task kjournald:2081 blocked for
>> more than 120 seconds.
>> Oct 11 15:58:34 testvm3 kernel: "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Oct 11 15:58:34 testvm3 kernel: kjournald     D ffff810001736420
>> 0  2081     14          2494  2060 (L-TLB)
>> Oct 11 15:58:34 testvm3 kernel: ffff81003c087cf0 0000000000000046
>> ffff810030ef2288 ffff81003f5d6048
>> Oct 11 15:58:34 testvm3 kernel: 00000000037685c8 000000000000000a
>> ffff810037c53820 ffffffff80314b60
>> Oct 11 15:58:34 testvm3 kernel: 00001883cb68d47d 0000000000002c4e
>> ffff810037c53a08 000000003f5128b8
>> Oct 11 15:58:34 testvm3 kernel: Call Trace:
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8006ec8f>]
>> do_gettimeofday+0x40/0x90
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800155d3>]
>> sync_buffer+0x0/0x3f
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800637ce>]
>> io_schedule+0x3f/0x67
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8001560e>]
>> sync_buffer+0x3b/0x3f
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800639fa>]
>> __wait_on_bit+0x40/0x6e
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800155d3>]
>> sync_buffer+0x0/0x3f
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80063a94>]
>> out_of_line_wait_on_bit+0x6c/0x78
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2e2b>]
>> wake_bit_function+0x0/0x23
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff88033a41>]
>> :jbd:journal_commit_transaction+0x553/0x10aa
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8003d85b>]
>> lock_timer_base+0x1b/0x3c
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8004ad98>]
>> try_to_del_timer_sync+0x7f/0x88
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff88037662>]
>> :jbd:kjournald+0xc1/0x213
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2dfd>]
>> autoremove_wake_function+0x0/0x2e
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2be5>]
>> keventd_create_kthread+0x0/0xc4
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff880375a1>]
>> :jbd:kjournald+0x0/0x213
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2be5>]
>> keventd_create_kthread+0x0/0xc4
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80032722>] kthread+0xfe/0x132
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2be5>]
>> keventd_create_kthread+0x0/0xc4
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80032624>] kthread+0x0/0x132
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11
>> Oct 11 15:58:34 testvm3 kernel:
>> Oct 11 15:58:34 testvm3 kernel: INFO: task crond:3418 blocked for
>> more than 120 seconds.
>> Oct 11 15:58:34 testvm3 kernel: "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Oct 11 15:58:34 testvm3 kernel: crond         D ffff810001736420
>> 0  3418      1          3436  3405 (NOTLB)
>> Oct 11 15:58:34 testvm3 kernel: ffff810036c55ca8 0000000000000086
>> 0000000000000000 ffffffff80019e3e
>> Oct 11 15:58:34 testvm3 kernel: 0000000000065bf2 0000000000000007
>> ffff81003ce4b080 ffffffff80314b60
>> Oct 11 15:58:34 testvm3 kernel: 000018899ae16270 0000000000023110
>> ffff81003ce4b268 000000008804ec00
>> Oct 11 15:58:34 testvm3 kernel: Call Trace:
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80019e3e>] __getblk+0x25/0x22c
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8006ec8f>]
>> do_gettimeofday+0x40/0x90
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800155d3>]
>> sync_buffer+0x0/0x3f
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800637ce>]
>> io_schedule+0x3f/0x67
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8001560e>]
>> sync_buffer+0x3b/0x3f
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80063912>]
>> __wait_on_bit_lock+0x36/0x66
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800155d3>]
>> sync_buffer+0x0/0x3f
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800639ae>]
>> out_of_line_wait_on_bit_lock+0x6c/0x78
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2e2b>]
>> wake_bit_function+0x0/0x23
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8803181e>]
>> :jbd:do_get_write_access+0x54/0x522
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80019e3e>] __getblk+0x25/0x22c
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff88031d0e>]
>> :jbd:journal_get_write_access+0x22/0x33
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8804dd37>]
>> :ext3:ext3_reserve_inode_write+0x38/0x90
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8804ddb0>]
>> :ext3:ext3_mark_inode_dirty+0x21/0x3c
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff88050d35>]
>> :ext3:ext3_dirty_inode+0x63/0x7b
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80013d98>]
>> __mark_inode_dirty+0x29/0x16e
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80025a49>] filldir+0x0/0xb7
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8003516b>]
>> vfs_readdir+0x8c/0xa9
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800389db>]
>> sys_getdents+0x75/0xbd
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8005d229>] tracesys+0x71/0xe0
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>> Oct 11 15:58:34 testvm3 kernel:
>> Oct 11 15:58:34 testvm3 kernel: INFO: task httpd:3452 blocked for
>> more than 120 seconds.
>> Oct 11 15:58:34 testvm3 kernel: "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Oct 11 15:58:34 testvm3 kernel: httpd         D ffff810001736420
>> 0  3452   3405          3453       (NOTLB)
>> Oct 11 15:58:34 testvm3 kernel: ffff810035ea9dc8 0000000000000086
>> 0000000000000000 ffffffff80009a1c
>> Oct 11 15:58:34 testvm3 kernel: ffff810035ea9e28 0000000000000009
>> ffff810037e52080 ffffffff80314b60
>> Oct 11 15:58:34 testvm3 kernel: 000018839f75405c 000000000003363d
>> ffff810037e52268 000000003f5e7150
>
> Please note that although I am using ucarp for IP failover and by
> default ucarp will alway have a preferred master, I have added codes
> to make sure that the ucarp master will always become slave when it
> goes down and come up again.  This will ensure that WMware will not
> connect back to the failed node when it comes back up.
>
> However this does not prevent the problem I describe above.
>
> There are a lot of logs generated during self-healing process.  It
> doesn't make any sense to me.  I am attaching it. It's over 900k. So I
> zip them up.  Hopefully the mailling list allow attachment.
>
> Is there any best practices to setup/run gluster with replication as a
> datastore to VMware that make sure VM guests run smoothly even when
> one node goes into self-healing?
>
> Any advise is appreciated.
>
> Keith
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users