Definitely will try and collect some more data on this when I run a fix-layout. > On Feb 27, 2014, at 10:34 PM, Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote: > > Nick, Joao, > Data self-heal in afr is designed to be I/O friendly. Could you please help us identify the root cause for I/O lock up and fix it if possible. > It seems like this problem only happens for some of the VMs and not all. Do you guys think we can find steps to re-create the problem consistently? May be with a predefined workload inside the VM which can cause this problem while self-heal is in progress? > > Pranith. > ----- Original Message ----- >> From: "Nick Majeran" <nmajeran@xxxxxxxxx> >> To: "João Pagaime" <joao.pagaime@xxxxxxxxx> >> Cc: Gluster-users@xxxxxxxxxxx >> Sent: Thursday, February 27, 2014 6:43:07 PM >> Subject: Re: self-heal stops some vms (virtual machines) >> >> I've had similar issues adding bricks and running a fix-layout as well. >> >>> On Feb 27, 2014, at 3:56 AM, João Pagaime <joao.pagaime@xxxxxxxxx> wrote: >>> >>> yes, a real problem, enough to start thinking real hard on architecture >>> scenarios >>> >>> sorry but I can't share any solutions at this time >>> >>> one (complicated) workaround would be to "medically induce a coma" on a VM >>> as the self-heal starts on it, and resurrect it afterwards. >>> I mean something like this: >>> $ virsh suspend <vm-id> >>> (do self-heal on vm's disks) >>> $ virsh resume <vm-id> >>> problems: several, including to the vm users, but better than a kernel >>> lock-up. Feasibility problem: how to detect efficiently when the self-heal >>> starts on a specific file on the brick >>> >>> another related problem may be how to mitigate IO starvation on the brick, >>> when self-healing kicks in, since that process maybe a IO-hog. But I think >>> this is a lesser problem >>> >>> best regards >>> Joao >>> >>> >>> Em 27-02-2014 09:08, Fabio Rosati escreveu: >>>> Hi All, >>>> >>>> I run in exactly the same problem encountered by Joao. >>>> After rebooting one of the GlusterFS nodes, self-heal starts and some VMs >>>> can't access their disk images anymore. >>>> >>>> Logs from one of the VMs after one gluster node has rebooted: >>>> >>>> Feb 25 23:35:47 fwrt2 kernel: EXT4-fs error (device dm-2): >>>> __ext4_get_inode_loc: unable to read inode block - inode=2145, block=417 >>>> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector >>>> 15032608 >>>> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector >>>> 15307504 >>>> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector >>>> 15307552 >>>> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector >>>> 15307568 >>>> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector >>>> 15307504 >>>> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector >>>> 12972672 >>>> Feb 25 23:35:47 fwrt2 kernel: EXT4-fs error (device dm-1): >>>> ext4_find_entry: reading directory #123 offset 0 >>>> Feb 25 23:35:47 fwrt2 kernel: Core dump to |/usr/libexec/abrt-hook-ccpp 7 >>>> 0 2757 0 23 1393367747 e pipe failed >>>> Feb 25 23:35:47 fwrt2 kernel: end_request: I/O error, dev vda, sector >>>> 9250632 >>>> Feb 25 23:35:47 fwrt2 kernel: Read-error on swap-device (253:0:30536) >>>> Feb 25 23:35:47 fwrt2 kernel: Read-error on swap-device (253:0:30544) >>>> [...] >>>> >>>> >>>> I few hours later the VM seemed to be freezed and I had to kill and >>>> restart it, no more problems after reboot. >>>> >>>> This is the volume layout: >>>> >>>> Volume Name: gv_pri >>>> Type: Distributed-Replicate >>>> Volume ID: 3d91b91e-4d72-484f-8655-e5ed8d38bb28 >>>> Status: Started >>>> Number of Bricks: 2 x 2 = 4 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: nw1glus.gem.local:/glustexp/pri1/brick >>>> Brick2: nw2glus.gem.local:/glustexp/pri1/brick >>>> Brick3: nw3glus.gem.local:/glustexp/pri2/brick >>>> Brick4: nw4glus.gem.local:/glustexp/pri2/brick >>>> Options Reconfigured: >>>> storage.owner-gid: 107 >>>> storage.owner-uid: 107 >>>> server.allow-insecure: on >>>> network.remote-dio: on >>>> performance.write-behind-window-size: 16MB >>>> performance.cache-size: 128MB >>>> >>>> OS: CentOS 6.5 >>>> GlusterFS version: 3.4.2 >>>> >>>> The qemu-kvm VMs access their qcow2 disk images using the native Gluster >>>> support (no fuse mount). >>>> In the Gluster logs I didn't find anything special logged during self-heal >>>> but I can post them if needed. >>>> >>>> Anyone have an idea of what can cause these problems? >>>> >>>> Thank you >>>> Fabio >>>> >>>> >>>> ----- Messaggio originale ----- >>>> Da: "João Pagaime" <joao.pagaime@xxxxxxxxx> >>>> A: Gluster-users@xxxxxxxxxxx >>>> Inviato: Venerdì, 7 febbraio 2014 13:13:59 >>>> Oggetto: self-heal stops some vms (virtual machines) >>>> >>>> hello all >>>> >>>> I have a replicate volume that holds kvm vms (virtual machines) >>>> >>>> I had to stop one gluster-server for maintenance . That part of the >>>> operation went well: no vms problems after shutdown >>>> >>>> the problems started after booting the gluster-server. Self-healing >>>> started as expected, but some vms locked up with disk problems >>>> (time-outs), as self-healing goes by them. >>>> Some VMs did survive the self-healing . I suppose the ones with low IO >>>> activity or less sensitive to disk problems >>>> >>>> is there some specific gluster configuration to enable a self-healing >>>> ride-through on running-vms? (cluster.data-self-heal-algorithm is >>>> already on the diff mode) >>>> >>>> is there some tweaks recommended to do on vms running on top of gluster? >>>> >>>> current config: >>>> >>>> gluster: 3.3.0-1.el6.x86_64 >>>> >>>> --------------------- volume: >>>> # gluster volume info VOL >>>> >>>> Volume Name: VOL >>>> Type: Distributed-Replicate >>>> Volume ID: f44182d9-24eb-4953-9cdd-71464f9517e0 >>>> Status: Started >>>> Number of Bricks: 2 x 2 = 4 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: one-gluster01:/san02-v2 >>>> Brick2: one-gluster02:/san02-v2 >>>> Brick3: one-gluster01:/san03 >>>> Brick4: one-gluster02:/san04 >>>> Options Reconfigured: >>>> diagnostics.count-fop-hits: on >>>> diagnostics.latency-measurement: on >>>> nfs.disable: on >>>> auth.allow:x >>>> performance.flush-behind: off >>>> cluster.self-heal-window-size: 1 >>>> performance.cache-size: 67108864 >>>> cluster.data-self-heal-algorithm: diff >>>> performance.io-thread-count: 32 >>>> cluster.min-free-disk: 250GB >>>> >>>> thanks, >>>> best regards, >>>> joao >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users@xxxxxxxxxxx >>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users@xxxxxxxxxxx >>> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@xxxxxxxxxxx >> http://supercolony.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users