I believe Proxmox is just an interface to KVM that uses the lib, so if I'm not mistaken there isn't client logs ? It's not the first time I have the issue, it happens on every heal on the 2 clusters I have. I did let the heal finish that night and the VMs are working now, but it is pretty scarry for future crashes or brick replacement. Should I maybe lower the shard size ? Won't solve the fact that 2 bricks on 3 aren't keeping the filesystem usable but might make the healing quicker right ? Thanks Le 17 avril 2016 17:56:37 GMT+02:00, Krutika Dhananjay <kdhananj@xxxxxxxxxx> a écrit : >Could you share the client logs and information about the approx >time/day >when you saw this issue? > >-Krutika > >On Sat, Apr 16, 2016 at 12:57 AM, Kevin Lemonnier ><lemonnierk@xxxxxxxxx> >wrote: > >> Hi, >> >> We have a small glusterFS 3.7.6 cluster with 3 nodes running with >proxmox >> VM's on it. I did set up the different recommended option like the >virt >> group, but >> by hand since it's on debian. The shards are 256MB, if that matters. >> >> This morning the second node crashed, and as it came back up started >a >> heal, but that basically froze all the VM's running on that volume. >Since >> we really really >> can't have 40 minutes down time in the middle of the day, I just >removed >> the node from the network and that stopped the heal, allowing the >VM's to >> access >> their disks again. The plan was to re-connecte the node in a couple >of >> hours to let it heal at night. >> But a VM crashed now, and it can't boot up again : seems to freez >trying >> to access the disks. >> >> Looking at the heal info for the volume, it has gone way up since >this >> morning, it looks like the VM's aren't writing to both nodes, just >the one >> they are on. >> It seems pretty bad, we have 2 nodes on 3 up, I would expect the >volume to >> work just fine since it has quorum. What am I missing ? >> >> It is still too early to start the heal, is there a way to start the >VM >> anyway right now ? I mean, it was running a moment ago so the data is >> there, it just needs >> to let the VM access it. >> >> >> >> Volume Name: vm-storage >> Type: Replicate >> Volume ID: a5b19324-f032-4136-aaac-5e9a4c88aaef >> Status: Started >> Number of Bricks: 1 x 3 = 3 >> Transport-type: tcp >> Bricks: >> Brick1: first_node:/mnt/vg1-storage >> Brick2: second_node:/mnt/vg1-storage >> Brick3: third_node:/mnt/vg1-storage >> Options Reconfigured: >> cluster.quorum-type: auto >> cluster.server-quorum-type: server >> network.remote-dio: enable >> cluster.eager-lock: enable >> performance.readdir-ahead: on >> performance.quick-read: off >> performance.read-ahead: off >> performance.io-cache: off >> performance.stat-prefetch: off >> features.shard: on >> features.shard-block-size: 256MB >> cluster.server-quorum-ratio: 51% >> >> >> Thanks for your help >> >> -- >> Kevin Lemonnier >> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@xxxxxxxxxxx >> http://www.gluster.org/mailman/listinfo/gluster-users >> -- Envoyé de mon appareil Android avec K-9 Mail. Veuillez excuser ma brièveté. _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users