Most recently this happened on Gluster 3.6.6, I know it happened on another earlier minor release of 3.6, maybe 3.6.4. Currently on 3.6.8, I can try to re-create on another replica volume. Which logs would give some useful info, under which logging level? >From host with brick down. 2016-02-06 00:40 was approximately when I got restarted glusterd to get the brick to start properly. glfsheal-vm-storage.log ... [2015-11-30 20:37:17.348673] I [glfs-resolve.c:836:__glfs_active_subvol] 0-vm-storage: switched to graph 676c7573-7465-7230-312e-706369632e75 (0) [2016-02-06 00:27:15.282280] E [client-handshake.c:1496:client_query_portmap_cbk] 0-vm-storage-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2016-02-06 00:27:49.797465] E [client-handshake.c:1496:client_query_portmap_cbk] 0-vm-storage-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2016-02-06 00:27:54.126627] E [client-handshake.c:1496:client_query_portmap_cbk] 0-vm-storage-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2016-02-06 00:27:58.449801] E [client-handshake.c:1496:client_query_portmap_cbk] 0-vm-storage-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2016-02-06 00:31:56.139278] E [client-handshake.c:1496:client_query_portmap_cbk] 0-vm-storage-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. <nothing newer in logs> The brick log, which has a massive amount of these errors (https://dl.dropboxusercontent.com/u/21916057/mnt-lv-vm-storage-vm-storage.log-20160207.tar.gz): [2016-02-06 00:43:43.280048] E [socket.c:1972:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1700885605) received from 142.104.230.33:38710 [2016-02-06 00:43:43.280159] E [socket.c:1972:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1700885605) received from 142.104.230.33:38710 [2016-02-06 00:43:43.280325] E [socket.c:1972:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1700885605) received from 142.104.230.33:38710 But I only peer and mount gluster on a private subnet so its a bit odd.. but I don't know if its related. On Tue, Feb 9, 2016 at 5:38 PM, Ravishankar N <ravishankar@xxxxxxxxxx> wrote: > Hi Steve, > The patch already went in for 3.6.3 > (https://bugzilla.redhat.com/show_bug.cgi?id=1187547). What version are you > using? If it is 3.6.3 or newer, can you share the logs if this happens > again? (or possibly try if you can reproduce the issue on your setup). > Thanks, > Ravi > > > On 02/10/2016 02:25 AM, FNU Raghavendra Manjunath wrote: > > > Adding Pranith, maintainer of the replicate feature. > > > Regards, > Raghavendra > > > On Tue, Feb 9, 2016 at 3:33 PM, Steve Dainard <sdainard@xxxxxxxx> wrote: >> >> There is a thread from 2014 mentioning that the heal process on a >> replica volume was de-sparsing sparse files.(1) >> >> I've been experiencing the same issue on Gluster 3.6.x. I see there is >> a bug closed for a fix on Gluster 3.7 (2) and I'm wondering if this >> fix can be back-ported to Gluster 3.6.x? >> >> My experience has been: >> Replica 3 volume >> 1 brick went offline >> Brought brick back online >> Heal full on volume >> My 500G vm-storage volume went from ~280G used to >400G used. >> >> I've experienced this a couple times previously, and used fallocate to >> re-sparse files but this is cumbersome at best, and lack of proper >> heal support on sparse files could be disastrous if I didn't have >> enough free space and ended up crashing my VM's when my storage domain >> ran out of space. >> >> Seeing as 3.6 is still a supported release, and 3.7 feels too bleeding >> edge for production systems, I think it makes sense to back-port this >> fix if possible. >> >> Thanks, >> Steve >> >> >> >> 1. >> https://www.gluster.org/pipermail/gluster-users/2014-November/019512.html >> 2. https://bugzilla.redhat.com/show_bug.cgi?id=1166020 >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@xxxxxxxxxxx >> http://www.gluster.org/mailman/listinfo/gluster-users > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users