On Sun, Sep 10, 2017 at 3:02 PM, Vijay Bellur <vbellur@xxxxxxxxxx> wrote: > > > On Fri, Sep 8, 2017 at 5:56 AM, Pavel Szalbot <pavel.szalbot@xxxxxxxxx> > wrote: >> >> This is the qemu log of instance: >> >> [2017-09-08 09:31:48.381077] C >> [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] >> 0-gv_openstack_1-client-1: server 10.0.1.202:49152 has not responded >> in the last 1 seconds, disconnecting. >> > > > 1 second is not an ideal value for ping timeout. Can you please set it to 30 > seconds or so and simulate the problem? I would be interested in observing > your logs with a higher ping timeout value. I restored the default 42 seconds and do not experience crashes now even with gfapi except occasional performance drops reported by fio. This applies to both replica 3 with and without arbiter. I had previously used timeout 3 or 5 seconds and 1sec was set at the end of the last week so I tested the 5 seconds timeout now and got crash quite fast. I guess the ping to the nodes that are actually up must timeout as well and that's the reason for IO errors. But 5 seconds seem like quite a long time. Or could it be something else? BTW I restart primary mount point (the one specified in mount.glusterfs) IO stops for several seconds and then fluctuates between 0 and something more or less expected and this fluctuation is present until the node comes up. If I restart the other node in the cluster, this does not happen. Is this to be expected? Thanks -ps _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users