Problems with qemu and disperse volumes (live merge)

Marco Fais <evilmf@xxxxxxxxx> · Mon, 29 Jun 2020 23:59:36 +0100

Hi,
I am having a problem recently with Gluster disperse volumes and live merge on qemu-kvm.

I am using Gluster as a storage backend of an oVirt cluster; we are planning to use VM snapshots in the process of taking daily backups on the VMs and we are encountering issues when the VMs are stored in a distributed-disperse volume.

First of all, I am using gluster 7.5, libvirt 6.0, qemu 4.2 and oVirt 4.4.0 on CentOS 8.1

The sequence of events is the following:

1) On a running VM, create a new snapshot

The operation completes successfully, however I can observe the following errors on the gluster logs:

[2020-06-29 21:54:18.942422] I [MSGID: 109066] [dht-rename.c:1951:dht_rename] 0-SSD_Storage-dht: renaming /58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/64c038a4-5fe4-4f57-8b1c-bab38ae5c5bb.meta.new (a89f2ccb-be41-4ff7-bbaf-abb786e76bc7) (hash=SSD_Storage-disperse-1/cache=SSD_Storage-disperse-1) => /58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/64c038a4-5fe4-4f57-8b1c-bab38ae5c5bb.meta (f55c1f35-63fa-4d27-9aa9-09b60163e565) (hash=SSD_Storage-disperse-2/cache=SSD_Storage-disperse-1)  
[2020-06-29 21:54:18.947273] W [MSGID: 122019] [ec-helpers.c:401:ec_loc_gfid_check] 0-SSD_Storage-disperse-2: Mismatching GFID's in loc 
[2020-06-29 21:54:18.947290] W [MSGID: 109002] [dht-rename.c:1019:dht_rename_links_create_cbk] 0-SSD_Storage-dht: link/file /58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/64c038a4-5fe4-4f57-8b1c-bab38ae5c5bb.meta on SSD_Storage-disperse-2 failed [Input/output error]
[2020-06-29 21:54:19.197482] I [MSGID: 109066] [dht-rename.c:1951:dht_rename] 0-SSD_Storage-dht: renaming /58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/a54793c1-c804-425d-894e-2dfe7a63af4b.meta.new (b4888032-3758-4f62-a4ae-fb48902f83d2) (hash=SSD_Storage-disperse-4/cache=SSD_Storage-disperse-4) => /58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/a54793c1-c804-425d-894e-2dfe7a63af4b.meta ((null)) (hash=SSD_Storage-disperse-4/cache=<nul>)  

2) Once the snapshot has been created, try to delete it while the VM is running

The above seems to be running for a couple of seconds and then suddenly the qemu-kvm process crashes. On the qemu VM logs I can see the following:

Unexpected error in raw_check_lock_bytes() at block/file-posix.c:811:
2020-06-29T21:56:23.933603Z qemu-kvm: Failed to get shared "write" lock

At the same time, the gluster logs report the following:

[2020-06-29 21:56:23.850417] I [MSGID: 109066] [dht-rename.c:1951:dht_rename] 0-SSD_Storage-dht: renaming /58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/64c038a4-5fe4-4f57-8b1c-bab38ae5c5bb.meta.new (1999a713-a0ed-45fb-8ab7-7dbda6d02a78) (hash=SSD_Storage-disperse-1/cache=SSD_Storage-disperse-1) => /58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/64c038a4-5fe4-4f57-8b1c-bab38ae5c5bb.meta (a89f2ccb-be41-4ff7-bbaf-abb786e76bc7) (hash=SSD_Storage-disperse-2/cache=SSD_Storage-disperse-1)  
[2020-06-29 21:56:23.855027] W [MSGID: 122019] [ec-helpers.c:401:ec_loc_gfid_check] 0-SSD_Storage-disperse-2: Mismatching GFID's in loc 
[2020-06-29 21:56:23.855045] W [MSGID: 109002] [dht-rename.c:1019:dht_rename_links_create_cbk] 0-SSD_Storage-dht: link/file /58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/64c038a4-5fe4-4f57-8b1c-bab38ae5c5bb.meta on SSD_Storage-disperse-2 failed [Input/output error]
[2020-06-29 21:56:23.922638] I [MSGID: 109066] [dht-rename.c:1951:dht_rename] 0-SSD_Storage-dht: renaming /58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/a54793c1-c804-425d-894e-2dfe7a63af4b.meta.new (e5c578b3-b91a-4263-a7e3-40f9c7e3628b) (hash=SSD_Storage-disperse-4/cache=SSD_Storage-disperse-4) => /58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/a54793c1-c804-425d-894e-2dfe7a63af4b.meta (b4888032-3758-4f62-a4ae-fb48902f83d2) (hash=SSD_Storage-disperse-4/cache=SSD_Storage-disperse-4)  
[2020-06-29 21:56:26.017309] E [fuse-bridge.c:227:check_and_dump_fuse_W] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x133)[0x7fd4fa4d6a53] (--> /usr/lib64/glusterfs/7.5/xlator/mount/fuse.so(+0x8e82)[0x7fd4f64cee82] (--> /usr/lib64/glusterfs/7.5/xlator/mount/fuse.so(+0xa072)[0x7fd4f64d0072] (--> /lib64/libpthread.so.0(+0x82de)[0x7fd4f90582de] (--> /lib64/libc.so.6(clone+0x43)[0x7fd4f88aa133] ))))) 0-glusterfs-fuse: writing to fuse device failed: No such file or directory
[2020-06-29 21:56:26.017421] E [fuse-bridge.c:227:check_and_dump_fuse_W] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x133)[0x7fd4fa4d6a53] (--> /usr/lib64/glusterfs/7.5/xlator/mount/fuse.so(+0x8e82)[0x7fd4f64cee82] (--> /usr/lib64/glusterfs/7.5/xlator/mount/fuse.so(+0xa072)[0x7fd4f64d0072] (--> /lib64/libpthread.so.0(+0x82de)[0x7fd4f90582de] (--> /lib64/libc.so.6(clone+0x43)[0x7fd4f88aa133] ))))) 0-glusterfs-fuse: writing to fuse device failed: No such file or directory
[2020-06-29 21:56:26.017524] E [fuse-bridge.c:227:check_and_dump_fuse_W] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x133)[0x7fd4fa4d6a53] (--> /usr/lib64/glusterfs/7.5/xlator/mount/fuse.so(+0x8e82)[0x7fd4f64cee82] (--> /usr/lib64/glusterfs/7.5/xlator/mount/fuse.so(+0xa072)[0x7fd4f64d0072] (--> /lib64/libpthread.so.0(+0x82de)[0x7fd4f90582de] (--> /lib64/libc.so.6(clone+0x43)[0x7fd4f88aa133] ))))) 0-glusterfs-fuse: writing to fuse device failed: No such file or directory

Initially I thought this was a qemu-kvm issue; however the above works perfectly on a distributed-replicated volume on exactly the same HW, software and gluster volume options.
Also, the issue can be replicated 100% of the times -- every time I try to delete the snapshot the process crashes.

Not sure what's the best way to proceed -- I have tried to file a bug but unfortunately didn't get any traction.
Gluster volume info here:

Volume Name: SSD_Storage
Type: Distributed-Disperse
Volume ID: 4e1bf45d-9ecd-44f2-acde-dd338e18379c
Status: Started
Snapshot Count: 0
Number of Bricks: 6 x (4 + 2) = 36
Transport-type: tcp
Bricks:
Brick1: cld-cnvirt-h01-storage:/bricks/vm_b1/brick
Brick2: cld-cnvirt-h02-storage:/bricks/vm_b1/brick
Brick3: cld-cnvirt-h03-storage:/bricks/vm_b1/brick
Brick4: cld-cnvirt-h04-storage:/bricks/vm_b1/brick
Brick5: cld-cnvirt-h05-storage:/bricks/vm_b1/brick
Brick6: cld-cnvirt-h06-storage:/bricks/vm_b1/brick
Brick7: cld-cnvirt-h01-storage:/bricks/vm_b2/brick
Brick8: cld-cnvirt-h02-storage:/bricks/vm_b2/brick
Brick9: cld-cnvirt-h03-storage:/bricks/vm_b2/brick
Brick10: cld-cnvirt-h04-storage:/bricks/vm_b2/brick
Brick11: cld-cnvirt-h05-storage:/bricks/vm_b2/brick
Brick12: cld-cnvirt-h06-storage:/bricks/vm_b2/brick
Brick13: cld-cnvirt-h01-storage:/bricks/vm_b3/brick
Brick14: cld-cnvirt-h02-storage:/bricks/vm_b3/brick
Brick15: cld-cnvirt-h03-storage:/bricks/vm_b3/brick
Brick16: cld-cnvirt-h04-storage:/bricks/vm_b3/brick
Brick17: cld-cnvirt-h05-storage:/bricks/vm_b3/brick
Brick18: cld-cnvirt-h06-storage:/bricks/vm_b3/brick
Brick19: cld-cnvirt-h01-storage:/bricks/vm_b4/brick
Brick20: cld-cnvirt-h02-storage:/bricks/vm_b4/brick
Brick21: cld-cnvirt-h03-storage:/bricks/vm_b4/brick
Brick22: cld-cnvirt-h04-storage:/bricks/vm_b4/brick
Brick23: cld-cnvirt-h05-storage:/bricks/vm_b4/brick
Brick24: cld-cnvirt-h06-storage:/bricks/vm_b4/brick
Brick25: cld-cnvirt-h01-storage:/bricks/vm_b5/brick
Brick26: cld-cnvirt-h02-storage:/bricks/vm_b5/brick
Brick27: cld-cnvirt-h03-storage:/bricks/vm_b5/brick
Brick28: cld-cnvirt-h04-storage:/bricks/vm_b5/brick
Brick29: cld-cnvirt-h05-storage:/bricks/vm_b5/brick
Brick30: cld-cnvirt-h06-storage:/bricks/vm_b5/brick
Brick31: cld-cnvirt-h01-storage:/bricks/vm_b6/brick
Brick32: cld-cnvirt-h02-storage:/bricks/vm_b6/brick
Brick33: cld-cnvirt-h03-storage:/bricks/vm_b6/brick
Brick34: cld-cnvirt-h04-storage:/bricks/vm_b6/brick
Brick35: cld-cnvirt-h05-storage:/bricks/vm_b6/brick
Brick36: cld-cnvirt-h06-storage:/bricks/vm_b6/brick
Options Reconfigured:
nfs.disable: on
storage.fips-mode-rchecksum: on
performance.strict-o-direct: on
network.remote-dio: off
storage.owner-uid: 36
storage.owner-gid: 36
network.ping-timeout: 30

I have tried many different options but unfortunately have the same results. I have the same problem in three different clusters (same versions).

Any suggestions?

Thanks,
Marco

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users