Hi all, I have promised to do some testing and I finally find some time and infrastructure. So I have 3 servers with Gluster 3.10.5 on CentOS 7. I created replicated volume with arbiter (2+1) and VM on KVM (via Openstack) with disk accessible through gfapi. Volume group is set to virt (gluster volume set gv_openstack_1 virt). VM runs current (all packages updated) Ubuntu Xenial. I set up following fio job: [job1] ioengine=libaio size=1g loops=16 bs=512k direct=1 filename=/tmp/fio.data2 When I run fio fio.job and reboot one of the data nodes, IO statistics reported by fio drop to 0KB/0KB and 0 IOPS. After a while, root filesystem gets remounted as read-only. If you care about infrastructure, setup details etc., do not hesitate to ask. Gluster info on volume: Volume Name: gv_openstack_1 Type: Replicate Volume ID: 2425ae63-3765-4b5e-915b-e132e0d3fff1 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: gfs-2.san:/export/gfs/gv_1 Brick2: gfs-3.san:/export/gfs/gv_1 Brick3: docker3.san:/export/gfs/gv_1 (arbiter) Options Reconfigured: nfs.disable: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off performance.low-prio-threads: 32 network.remote-dio: enable cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off Partial KVM XML dump: <disk type='network' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source protocol='gluster' name='gv_openstack_1/volume-77ebfd13-6a92-4f38-b036-e9e55d752e1e'> <host name='10.0.1.201' port='24007'/> </source> <backingStore/> <target dev='vda' bus='virtio'/> <serial>77ebfd13-6a92-4f38-b036-e9e55d752e1e</serial> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> Networking is LACP on data nodes, stack of Juniper EX4550's (10Gbps SFP+), separate VLAN for Gluster traffic, SSD only on Gluster all nodes (including arbiter). I would really love to know what am I doing wrong, because this is my experience with Gluster for a long time a and a reason I would not recommend it as VM storage backend in production environment where you cannot start/stop VMs on your own (e.g. providing private clouds for customers). -ps On Sun, Sep 3, 2017 at 10:21 PM, Gionatan Danti <g.danti@xxxxxxxxxx> wrote: > Il 30-08-2017 17:07 Ivan Rossi ha scritto: >> >> There has ben a bug associated to sharding that led to VM corruption >> that has been around for a long time (difficult to reproduce I >> understood). I have not seen reports on that for some time after the >> last fix, so hopefully now VM hosting is stable. > > > Mmmm... this is precisely the kind of bug that scares me... data corruption > :| > Any more information on what causes it and how to resolve? Even if in newer > Gluster releases it is a solved bug, knowledge on how to treat it would be > valuable. > > > Thanks. > > -- > Danti Gionatan > Supporto Tecnico > Assyoma S.r.l. - www.assyoma.it > email: g.danti@xxxxxxxxxx - info@xxxxxxxxxx > GPG public key ID: FF5F32A8 > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://lists.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users