Thank you so much for responding! More below. > Anything in the logs of the fuse mount? can you stat the file from the mount? > also, the report of an image is only 64M makes me think about Sharding as the > default value of Shard size is 64M. > Do you have any clues on when this issue start to happen? was there any > operation done to the Gluster cluster? - I had just created the gluster volumes within an hour of the problem to test the vary problem I reported. So it was a "fresh start". - It booted one or two times, then stopped booting. Once it couldn't boot, all 3 nodes were the same in that grub2 couldn't boot in the VM image. As for the fuse log, I did see a couple of these before it happened the first time. I'm not sure if it's a clue or not. [2021-01-25 22:48:19.310467] I [fuse-bridge.c:5777:fuse_graph_sync] 0-fuse: switched to graph 0 [2021-01-25 22:50:09.693958] E [fuse-bridge.c:227:check_and_dump_fuse_W] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17a)[0x7f914e346faa] (--> /usr/lib64/glusterfs/7.2/xlator/mount/fuse.so(+0x874a)[0x7f914a3d374a] (--> /usr/lib64/glusterfs/7.2/xlator/mount/fuse.so(+0x91cb)[0x7f914a3d41cb] (--> /lib64/libpthread.so.0(+0x84f9)[0x7f914cf184f9] (--> /lib64/libc.so.6(clone+0x3f)[0x7f914c76afbf] ))))) 0-glusterfs-fuse: writing to fuse device failed: No such file or directory [2021-01-25 22:50:09.694462] E [fuse-bridge.c:227:check_and_dump_fuse_W] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x17a)[0x7f914e346faa] (--> /usr/lib64/glusterfs/7.2/xlator/mount/fuse.so(+0x874a)[0x7f914a3d374a] (--> /usr/lib64/glusterfs/7.2/xlator/mount/fuse.so(+0x91cb)[0x7f914a3d41cb] (--> /lib64/libpthread.so.0(+0x84f9)[0x7f914cf184f9] (--> /lib64/libc.so.6(clone+0x3f)[0x7f914c76afbf] ))))) 0-glusterfs-fuse: writing to fuse device failed: No such file or directory I have reserved the test system again. My plans today are: - Start over with the gluster volume on the machine with sles15sp2 updates - Learn if there are modifications to the image (besides mounting/umounting filesystems with the image using kpartx to map them to force it to work). What if I add/remove a byte from the end of the image file for example. - Revert the setup to sles15sp2 with no updates. My theory is the updates are not making a difference and it's just random chance. (re-making the gluster volume in the process) - The 64MB shard size made me think too!! - If the team feels it is worth it, I could try a newer gluster. We're using the versions we've validated at scale when we have large clusters in the factory but if the team thinks I should try something else I'm happy to re-build it!!! We are @ 7.2 plus afr-event-gen-changes patch. I will keep a better eye on the fuse log to tie an error to the problem starting. THANKS AGAIN for responding and let me know if you have any more clues! Erik > > On Tue, Jan 26, 2021 at 2:40 AM Erik Jacobson <erik.jacobson@xxxxxxx> wrote: > > Hello all. Thanks again for gluster. We're having a strange problem > getting virtual machines started that are hosted on a gluster volume. > > One of the ways we use gluster now is to make a HA-ish cluster head > node. A virtual machine runs in the shared storage and is backed up by 3 > physical servers that contribute to the gluster storage share. > > We're using sharding in this volume. The VM image file is around 5T and > we use qemu-img with falloc to get all the blocks allocated in advance. > > We are not using gfapi largely because it would mean we have to build > our own libvirt and qemu and we'd prefer not to do that. So we're using > a glusterfs fuse mount to host the image. The virtual machine is using > virtio disks but we had similar trouble using scsi emulation. > > The issue: - all seems well, the VM head node installs, boots, etc. > > However, at some point, it stops being able to boot! grub2 acts like it > cannot find /boot. At the grub2 prompt, it can see the partitions, but > reports no filesystem found where there are indeed filesystems. > > If we switch qemu to use "direct kernel load" (bypass grub2), this often > works around the problem but in one case Linux gave us a clue. Linux > reported /dev/vda as only being 64 megabytes, which would explain a lot. > This means the virtual machine Linux though the disk supplied by the > disk image was tiny! 64M instead of 5T > > We are using sles15sp2 and hit the problem more often with updates > applied than without. I'm in the process of trying to isolate if there > is a sles15sp2 update causing this, or if we're within "random chance". > > On one of the physical nodes, if it is in the failure mode, if I use > 'kpartx' to create the partitions from the image file, then mount the > giant root filesystem (ie mount /dev/mapper/loop0p31 /mnt) and then > umount /mnt, then that physical node starts the VM fine, grub2 loads, > the virtual machine is fully happy! Until I try to shut it down and > start it up again, at which point it sticks at grub2 again! What about > mounting the image file makes it so qemu sees the whole disk? > > The problem doesn't always happen but once it starts, the same VM image has > trouble starting on any of the 3 physical nodes sharing the storage. > But using the trick to force-mount the root within the image with > kpartx, then the machine can come up. My only guess is this changes the > file just a tiny bit in the middle of the image. > > Once the problem starts, it keeps happening except temporarily working > when I do the loop mount trick on the physical admin. > > > Here is some info about what I have in place: > > > nano-1:/adminvm/images # gluster volume info > > Volume Name: adminvm > Type: Replicate > Volume ID: 67de902c-8c00-4dc9-8b69-60b93b5f6104 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: 172.23.255.151:/data/brick_adminvm > Brick2: 172.23.255.152:/data/brick_adminvm > Brick3: 172.23.255.153:/data/brick_adminvm > Options Reconfigured: > performance.client-io-threads: on > nfs.disable: on > storage.fips-mode-rchecksum: on > transport.address-family: inet > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.low-prio-threads: 32 > network.remote-dio: enable > cluster.eager-lock: enable > cluster.quorum-type: auto > cluster.server-quorum-type: server > cluster.data-self-heal-algorithm: full > cluster.locking-scheme: granular > cluster.shd-max-threads: 8 > cluster.shd-wait-qlength: 10000 > features.shard: on > user.cifs: off > cluster.choose-local: off > client.event-threads: 4 > server.event-threads: 4 > cluster.granular-entry-heal: enable > storage.owner-uid: 439 > storage.owner-gid: 443 > > > > > libglusterfs0-7.2-4723.1520.210122T1700.a.sles15sp2hpe.x86_64 > glusterfs-7.2-4723.1520.210122T1700.a.sles15sp2hpe.x86_64 > python3-gluster-7.2-4723.1520.210122T1700.a.sles15sp2hpe.noarch > > > > nano-1:/adminvm/images # uname -a > Linux nano-1 5.3.18-24.46-default #1 SMP Tue Jan 5 16:11:50 UTC 2021 > (4ff469b) x86_64 x86_64 x86_64 GNU/Linux > nano-1:/adminvm/images # rpm -qa | grep qemu-4 > qemu-4.2.0-9.4.x86_64 > > > > Would love any advice!!!! > > > Erik > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Respectfully > Mahdi ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users