Re: Problems with qemu and disperse volumes (live merge)

Marco Fais <evilmf@xxxxxxxxx> · Tue, 30 Jun 2020 12:18:10 +0100

i Strahil
thanks a million for your reply.

I mainly thought that disperse volume where not supported because of the complexity of managing them (due to the various possible combinations of number of hosts / bricks and redundancy); however I assumed that once implemented and managed separately they could be used as VM storage for oVirt -- given they are in general supported by RHGS.

When you say they will not be optimal are you referring mainly to performance considerations? We did plenty of testing, and in terms of performance didn't have issues even with I/O intensive workloads (using SSDs, I had issues with spinning disks).

Replica 3 with arbiter is the other possible options for us, but clearly is less efficient in terms of storage usage than the current disperse 4+2 volumes, and the main issue for us is that having two servers down (out of the three in each replica) will create a service outage -- while with a disperse 4+2 combination we can withstand two servers down out of six (e.g. one has been brought down in maintenance and at that time another server has an issue). That's the reason I am keen to have it working with disperse -- apart from the specific issue with snapshot deletion, everything seems to work very well.

In regards to the options -- apologies I had applied the group with the "gluster volume set SSD_Storage group virt" command but for some reason it doesn't list the options in the "info".  I have re-applied them individually and the results are the same. See below for the list of options I am using:

Options Reconfigured:
storage.owner-gid: 36
storage.owner-uid: 36
performance.client-io-threads: on
server.event-threads: 4
client.event-threads: 4
cluster.choose-local: off
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: enable
performance.low-prio-threads: 32
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
storage.fips-mode-rchecksum: on
nfs.disable: on

Unfortunately we have the issue with all VMs -- doesn't seem to depend on the allocation of storage either (thin provisioned or pre allocated).

Thanks!
Marco

On Tue, 30 Jun 2020 at 05:12, Strahil Nikolov <hunter86_bg@xxxxxxxxx> wrote:
Hey Marco,

have you wondered why non-replifa volumes are not supported for oVirt  (or the paid downstreams)? Also disperse volume will not be optimal for your needs.

Have you thought about replica 3 with an arbiter ?

Now on the topic.

I don't see the optimize for virt option which you also need to apply (which involves sharding too).  You can find them in the gluster's group dir (it was someething like  /var/lib/glusterd/groups/virt).

With unsupported volume type and without any option the oVirt community recommend, you can and most probably feel bad situations.

Please, set the virt group options and try again.

Does the issue occur on another VM ?

Best Regards,

Strahil Nikolov

На 30 юни 2020 г. 1:59:36 GMT+03:00, Marco Fais <evilmf@xxxxxxxxx> написа:

>Hi,

>

>I am having a problem recently with Gluster disperse volumes and live

>merge

>on qemu-kvm.

>

>I am using Gluster as a storage backend of an oVirt cluster; we are

>planning to use VM snapshots in the process of taking daily backups on

>the

>VMs and we are encountering issues when the VMs are stored in a

>distributed-disperse volume.

>

>First of all, I am using gluster 7.5, libvirt 6.0, qemu 4.2 and oVirt

>4.4.0

>on CentOS 8.1

>

>The sequence of events is the following:

>

>1) On a running VM, create a new snapshot

>

>The operation completes successfully, however I can observe the

>following

>errors on the gluster logs:

>

>[2020-06-29 21:54:18.942422] I [MSGID: 109066]

>[dht-rename.c:1951:dht_rename] 0-SSD_Storage-dht: renaming

>/58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/64c038a4-5fe4-4f57-8b1c-bab38ae5c5bb.meta.new

>(a89f2ccb-be41-4ff7-bbaf-abb786e76bc7)

>(hash=SSD_Storage-disperse-1/cache=SSD_Storage-disperse-1) =>

>/58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/64c038a4-5fe4-4f57-8b1c-bab38ae5c5bb.meta

>(f55c1f35-63fa-4d27-9aa9-09b60163e565)

>(hash=SSD_Storage-disperse-2/cache=SSD_Storage-disperse-1)

>[2020-06-29 21:54:18.947273] W [MSGID: 122019]

>[ec-helpers.c:401:ec_loc_gfid_check] 0-SSD_Storage-disperse-2:

>Mismatching

>GFID's in loc

>[2020-06-29 21:54:18.947290] W [MSGID: 109002]

>[dht-rename.c:1019:dht_rename_links_create_cbk] 0-SSD_Storage-dht:

>link/file

>/58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/64c038a4-5fe4-4f57-8b1c-bab38ae5c5bb.meta

>on SSD_Storage-disperse-2 failed [Input/output error]

>[2020-06-29 21:54:19.197482] I [MSGID: 109066]

>[dht-rename.c:1951:dht_rename] 0-SSD_Storage-dht: renaming

>/58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/a54793c1-c804-425d-894e-2dfe7a63af4b.meta.new

>(b4888032-3758-4f62-a4ae-fb48902f83d2)

>(hash=SSD_Storage-disperse-4/cache=SSD_Storage-disperse-4) =>

>/58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/a54793c1-c804-425d-894e-2dfe7a63af4b.meta

>((null)) (hash=SSD_Storage-disperse-4/cache=<nul>)

>

>2) Once the snapshot has been created, try to delete it while the VM is

>running

>

>The above seems to be running for a couple of seconds and then suddenly

>the

>qemu-kvm process crashes. On the qemu VM logs I can see the following:

>

>Unexpected error in raw_check_lock_bytes() at block/file-posix.c:811:

>2020-06-29T21:56:23.933603Z qemu-kvm: Failed to get shared "write" lock

>

>At the same time, the gluster logs report the following:

>

>[2020-06-29 21:56:23.850417] I [MSGID: 109066]

>[dht-rename.c:1951:dht_rename] 0-SSD_Storage-dht: renaming

>/58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/64c038a4-5fe4-4f57-8b1c-bab38ae5c5bb.meta.new

>(1999a713-a0ed-45fb-8ab7-7dbda6d02a78)

>(hash=SSD_Storage-disperse-1/cache=SSD_Storage-disperse-1) =>

>/58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/64c038a4-5fe4-4f57-8b1c-bab38ae5c5bb.meta

>(a89f2ccb-be41-4ff7-bbaf-abb786e76bc7)

>(hash=SSD_Storage-disperse-2/cache=SSD_Storage-disperse-1)

>[2020-06-29 21:56:23.855027] W [MSGID: 122019]

>[ec-helpers.c:401:ec_loc_gfid_check] 0-SSD_Storage-disperse-2:

>Mismatching

>GFID's in loc

>[2020-06-29 21:56:23.855045] W [MSGID: 109002]

>[dht-rename.c:1019:dht_rename_links_create_cbk] 0-SSD_Storage-dht:

>link/file

>/58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/64c038a4-5fe4-4f57-8b1c-bab38ae5c5bb.meta

>on SSD_Storage-disperse-2 failed [Input/output error]

>[2020-06-29 21:56:23.922638] I [MSGID: 109066]

>[dht-rename.c:1951:dht_rename] 0-SSD_Storage-dht: renaming

>/58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/a54793c1-c804-425d-894e-2dfe7a63af4b.meta.new

>(e5c578b3-b91a-4263-a7e3-40f9c7e3628b)

>(hash=SSD_Storage-disperse-4/cache=SSD_Storage-disperse-4) =>

>/58e8dff0-3dfd-4554-9999-b8eb7744ce1b/images/998f0b18-1904-47f3-8cfb-a73ad063ab83/a54793c1-c804-425d-894e-2dfe7a63af4b.meta

>(b4888032-3758-4f62-a4ae-fb48902f83d2)

>(hash=SSD_Storage-disperse-4/cache=SSD_Storage-disperse-4)

>[2020-06-29 21:56:26.017309] E

>[fuse-bridge.c:227:check_and_dump_fuse_W]

>(--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x133)[0x7fd4fa4d6a53]

>(-->

>/usr/lib64/glusterfs/7.5/xlator/mount/fuse.so(+0x8e82)[0x7fd4f64cee82]

>(-->

>/usr/lib64/glusterfs/7.5/xlator/mount/fuse.so(+0xa072)[0x7fd4f64d0072]

>(-->

>/lib64/libpthread.so.0(+0x82de)[0x7fd4f90582de] (-->

>/lib64/libc.so.6(clone+0x43)[0x7fd4f88aa133] ))))) 0-glusterfs-fuse:

>writing to fuse device failed: No such file or directory

>[2020-06-29 21:56:26.017421] E

>[fuse-bridge.c:227:check_and_dump_fuse_W]

>(--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x133)[0x7fd4fa4d6a53]

>(-->

>/usr/lib64/glusterfs/7.5/xlator/mount/fuse.so(+0x8e82)[0x7fd4f64cee82]

>(-->

>/usr/lib64/glusterfs/7.5/xlator/mount/fuse.so(+0xa072)[0x7fd4f64d0072]

>(-->

>/lib64/libpthread.so.0(+0x82de)[0x7fd4f90582de] (-->

>/lib64/libc.so.6(clone+0x43)[0x7fd4f88aa133] ))))) 0-glusterfs-fuse:

>writing to fuse device failed: No such file or directory

>[2020-06-29 21:56:26.017524] E

>[fuse-bridge.c:227:check_and_dump_fuse_W]

>(--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x133)[0x7fd4fa4d6a53]

>(-->

>/usr/lib64/glusterfs/7.5/xlator/mount/fuse.so(+0x8e82)[0x7fd4f64cee82]

>(-->

>/usr/lib64/glusterfs/7.5/xlator/mount/fuse.so(+0xa072)[0x7fd4f64d0072]

>(-->

>/lib64/libpthread.so.0(+0x82de)[0x7fd4f90582de] (-->

>/lib64/libc.so.6(clone+0x43)[0x7fd4f88aa133] ))))) 0-glusterfs-fuse:

>writing to fuse device failed: No such file or directory

>

>Initially I thought this was a qemu-kvm issue; however the above works

>perfectly on a distributed-replicated volume on exactly the same HW,

>software and gluster volume options.

>Also, the issue can be replicated 100% of the times -- every time I try

>to

>delete the snapshot the process crashes.

>

>Not sure what's the best way to proceed -- I have tried to file a bug

>but

>unfortunately didn't get any traction.

>Gluster volume info here:

>

>Volume Name: SSD_Storage

>Type: Distributed-Disperse

>Volume ID: 4e1bf45d-9ecd-44f2-acde-dd338e18379c

>Status: Started

>Snapshot Count: 0

>Number of Bricks: 6 x (4 + 2) = 36

>Transport-type: tcp

>Bricks:

>Brick1: cld-cnvirt-h01-storage:/bricks/vm_b1/brick

>Brick2: cld-cnvirt-h02-storage:/bricks/vm_b1/brick

>Brick3: cld-cnvirt-h03-storage:/bricks/vm_b1/brick

>Brick4: cld-cnvirt-h04-storage:/bricks/vm_b1/brick

>Brick5: cld-cnvirt-h05-storage:/bricks/vm_b1/brick

>Brick6: cld-cnvirt-h06-storage:/bricks/vm_b1/brick

>Brick7: cld-cnvirt-h01-storage:/bricks/vm_b2/brick

>Brick8: cld-cnvirt-h02-storage:/bricks/vm_b2/brick

>Brick9: cld-cnvirt-h03-storage:/bricks/vm_b2/brick

>Brick10: cld-cnvirt-h04-storage:/bricks/vm_b2/brick

>Brick11: cld-cnvirt-h05-storage:/bricks/vm_b2/brick

>Brick12: cld-cnvirt-h06-storage:/bricks/vm_b2/brick

>Brick13: cld-cnvirt-h01-storage:/bricks/vm_b3/brick

>Brick14: cld-cnvirt-h02-storage:/bricks/vm_b3/brick

>Brick15: cld-cnvirt-h03-storage:/bricks/vm_b3/brick

>Brick16: cld-cnvirt-h04-storage:/bricks/vm_b3/brick

>Brick17: cld-cnvirt-h05-storage:/bricks/vm_b3/brick

>Brick18: cld-cnvirt-h06-storage:/bricks/vm_b3/brick

>Brick19: cld-cnvirt-h01-storage:/bricks/vm_b4/brick

>Brick20: cld-cnvirt-h02-storage:/bricks/vm_b4/brick

>Brick21: cld-cnvirt-h03-storage:/bricks/vm_b4/brick

>Brick22: cld-cnvirt-h04-storage:/bricks/vm_b4/brick

>Brick23: cld-cnvirt-h05-storage:/bricks/vm_b4/brick

>Brick24: cld-cnvirt-h06-storage:/bricks/vm_b4/brick

>Brick25: cld-cnvirt-h01-storage:/bricks/vm_b5/brick

>Brick26: cld-cnvirt-h02-storage:/bricks/vm_b5/brick

>Brick27: cld-cnvirt-h03-storage:/bricks/vm_b5/brick

>Brick28: cld-cnvirt-h04-storage:/bricks/vm_b5/brick

>Brick29: cld-cnvirt-h05-storage:/bricks/vm_b5/brick

>Brick30: cld-cnvirt-h06-storage:/bricks/vm_b5/brick

>Brick31: cld-cnvirt-h01-storage:/bricks/vm_b6/brick

>Brick32: cld-cnvirt-h02-storage:/bricks/vm_b6/brick

>Brick33: cld-cnvirt-h03-storage:/bricks/vm_b6/brick

>Brick34: cld-cnvirt-h04-storage:/bricks/vm_b6/brick

>Brick35: cld-cnvirt-h05-storage:/bricks/vm_b6/brick

>Brick36: cld-cnvirt-h06-storage:/bricks/vm_b6/brick

>Options Reconfigured:

>nfs.disable: on

>storage.fips-mode-rchecksum: on

>performance.strict-o-direct: on

>network.remote-dio: off

>storage.owner-uid: 36

>storage.owner-gid: 36

>network.ping-timeout: 30

>

>I have tried many different options but unfortunately have the same

>results. I have the same problem in three different clusters (same

>versions).

>

>Any suggestions?

>

>Thanks,

>Marco

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users