Re: self-heal stops some vms (virtual machines)

Nick Majeran <nmajeran@xxxxxxxxx> · Fri, 28 Feb 2014 07:13:19 -0600

Definitely will try and collect some more data on this when I run a fix-layout. 

> On Feb 27, 2014, at 10:34 PM, Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote:
> 
> Nick, Joao,
>    Data self-heal in afr is designed to be I/O friendly. Could you please help us identify the root cause for I/O lock up and fix it if possible.
> It seems like this problem only happens for some of the VMs and not all. Do you guys think we can find steps to re-create the problem consistently? May be with a predefined workload inside the VM which can cause this problem while self-heal is in progress?
> 
> Pranith.
> ----- Original Message -----
>> From: "Nick Majeran" <nmajeran@xxxxxxxxx>
>> To: "João Pagaime" <joao.pagaime@xxxxxxxxx>
>> Cc: Gluster-users@xxxxxxxxxxx
>> Sent: Thursday, February 27, 2014 6:43:07 PM
>> Subject: Re:  self-heal stops some vms (virtual machines)
>> 
>> I've had similar issues adding bricks and running a fix-layout as well.
>> 
>>> On Feb 27, 2014, at 3:56 AM, João Pagaime <joao.pagaime@xxxxxxxxx> wrote:
>>> 
>>> yes, a real problem, enough to start thinking real hard on architecture
>>> scenarios
>>> 
>>> sorry but I can't share any solutions at this time
>>> 
>>> one (complicated) workaround would be to "medically induce a coma" on a VM
>>> as the self-heal starts on it, and  resurrect it afterwards.
>>> I mean something like this:
>>> $ virsh suspend <vm-id>
>>> (do self-heal on vm's disks)
>>> $ virsh resume <vm-id>
>>> problems: several, including to the vm users, but better than a kernel
>>> lock-up. Feasibility problem: how to detect efficiently when the self-heal
>>> starts on a specific file on the brick
>>> 
>>> another related problem may be how to mitigate IO starvation on the brick,
>>> when self-healing kicks in, since that process maybe a IO-hog. But I think
>>> this is a lesser problem
>>> 
>>> best regards
>>> Joao
>>> 
>>> 
>>> Em 27-02-2014 09:08, Fabio Rosati escreveu:
>>>> Hi All,
>>>> 
>>>> I run in exactly the same problem encountered by Joao.
>>>> After rebooting one of the GlusterFS nodes, self-heal starts and some VMs
>>>> can't access their disk images anymore.
>>>> 
>>>> Logs from one of the VMs after one gluster node has rebooted:
>>>> 
>>>> Feb 25 23:35:47 fwrt2 kernel: EXT4-fs error (device dm-2):
>>>> __ext4_get_inode_loc: unable to read inode block - inode=2145, block=417
>>>> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector
>>>> 15032608
>>>> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector
>>>> 15307504
>>>> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector
>>>> 15307552
>>>> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector
>>>> 15307568
>>>> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector
>>>> 15307504
>>>> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector
>>>> 12972672
>>>> Feb 25 23:35:47 fwrt2 kernel: EXT4-fs error (device dm-1):
>>>> ext4_find_entry: reading directory #123 offset 0
>>>> Feb 25 23:35:47 fwrt2 kernel: Core dump to |/usr/libexec/abrt-hook-ccpp 7
>>>> 0 2757 0 23 1393367747 e pipe failed
>>>> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector
>>>> 9250632
>>>> Feb 25 23:35:47 fwrt2 kernel: Read-error on swap-device (253:0:30536)
>>>> Feb 25 23:35:47 fwrt2 kernel: Read-error on swap-device (253:0:30544)
>>>> [...]
>>>> 
>>>> 
>>>> I few hours later the VM seemed to be freezed and I had to kill and
>>>> restart it, no more problems after reboot.
>>>> 
>>>> This is the volume layout:
>>>> 
>>>> Volume Name: gv_pri
>>>> Type: Distributed-Replicate
>>>> Volume ID: 3d91b91e-4d72-484f-8655-e5ed8d38bb28
>>>> Status: Started
>>>> Number of Bricks: 2 x 2 = 4
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: nw1glus.gem.local:/glustexp/pri1/brick
>>>> Brick2: nw2glus.gem.local:/glustexp/pri1/brick
>>>> Brick3: nw3glus.gem.local:/glustexp/pri2/brick
>>>> Brick4: nw4glus.gem.local:/glustexp/pri2/brick
>>>> Options Reconfigured:
>>>> storage.owner-gid: 107
>>>> storage.owner-uid: 107
>>>> server.allow-insecure: on
>>>> network.remote-dio: on
>>>> performance.write-behind-window-size: 16MB
>>>> performance.cache-size: 128MB
>>>> 
>>>> OS: CentOS 6.5
>>>> GlusterFS version: 3.4.2
>>>> 
>>>> The qemu-kvm VMs access their qcow2 disk images using the native Gluster
>>>> support (no fuse mount).
>>>> In the Gluster logs I didn't find anything special logged during self-heal
>>>> but I can post them if needed.
>>>> 
>>>> Anyone have an idea of what can cause these problems?
>>>> 
>>>> Thank you
>>>> Fabio
>>>> 
>>>> 
>>>> ----- Messaggio originale -----
>>>> Da: "João Pagaime" <joao.pagaime@xxxxxxxxx>
>>>> A: Gluster-users@xxxxxxxxxxx
>>>> Inviato: Venerdì, 7 febbraio 2014 13:13:59
>>>> Oggetto:  self-heal stops some vms (virtual machines)
>>>> 
>>>> hello all
>>>> 
>>>> I have a replicate volume that holds kvm  vms (virtual machines)
>>>> 
>>>> I had to stop one gluster-server for maintenance . That part of the
>>>> operation went well: no vms problems after shutdown
>>>> 
>>>> the problems started after booting the gluster-server. Self-healing
>>>> started as expected, but some vms  locked up with disk problems
>>>> (time-outs), as self-healing goes by them.
>>>> Some VMs did survive the self-healing . I suppose the ones with low IO
>>>> activity or less sensitive to disk problems
>>>> 
>>>> is there some specific gluster configuration to enable a self-healing
>>>> ride-through on running-vms? (cluster.data-self-heal-algorithm is
>>>> already on the diff mode)
>>>> 
>>>> is there some tweaks recommended to do on vms running on top of gluster?
>>>> 
>>>> current config:
>>>> 
>>>> gluster:   3.3.0-1.el6.x86_64
>>>> 
>>>> --------------------- volume:
>>>> # gluster volume info VOL
>>>> 
>>>> Volume Name: VOL
>>>> Type: Distributed-Replicate
>>>> Volume ID: f44182d9-24eb-4953-9cdd-71464f9517e0
>>>> Status: Started
>>>> Number of Bricks: 2 x 2 = 4
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: one-gluster01:/san02-v2
>>>> Brick2: one-gluster02:/san02-v2
>>>> Brick3: one-gluster01:/san03
>>>> Brick4: one-gluster02:/san04
>>>> Options Reconfigured:
>>>> diagnostics.count-fop-hits: on
>>>> diagnostics.latency-measurement: on
>>>> nfs.disable: on
>>>> auth.allow:x
>>>> performance.flush-behind: off
>>>> cluster.self-heal-window-size: 1
>>>> performance.cache-size: 67108864
>>>> cluster.data-self-heal-algorithm: diff
>>>> performance.io-thread-count: 32
>>>> cluster.min-free-disk: 250GB
>>>> 
>>>> thanks,
>>>> best regards,
>>>> joao
>>>> 
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users@xxxxxxxxxxx
>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>> 
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users@xxxxxxxxxxx
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users@xxxxxxxxxxx
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users