Re: gluster 5.3: transport endpoint gets disconnected - Assertion failed: GF_MEM_TRAILER_MAGIC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Thu, Jan 24, 2019 at 12:47 PM Hu Bert <revirii@xxxxxxxxxxxxxx> wrote:
Good morning,

we currently transfer some data to a new glusterfs volume; to check
the throughput of the new volume/setup while the transfer is running i
decided to create some files on one of the gluster servers with dd in
loop:

while true; do dd if=/dev/urandom of=/shared/private/1G.file bs=1M
count=1024; rm /shared/private/1G.file; done

/shared/private is the mount point of the glusterfs volume. The dd
should run for about an hour. But now it happened twice that during
this loop the transport endpoint gets disconnected:

dd: failed to open '/shared/private/1G.file': Transport endpoint is
not connected
rm: cannot remove '/shared/private/1G.file': Transport endpoint is not connected

In the /var/log/glusterfs/shared-private.log i see:

[2019-01-24 07:03:28.938745] W [MSGID: 108001]
[afr-transaction.c:1062:afr_handle_quorum] 0-persistent-replicate-0:
7212652e-c437-426c-a0a9-a47f5972fffe: Failing WRITE as quorum i
s not met [Transport endpoint is not connected]
[2019-01-24 07:03:28.939280] E [mem-pool.c:331:__gf_free]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/5.3/xlator/cluster/replicate.so(+0x5be8c)
[0x7eff84248e8c] -->/usr/lib/x86_64-lin
ux-gnu/glusterfs/5.3/xlator/cluster/replicate.so(+0x5be18)
[0x7eff84248e18]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(__gf_free+0xf6)
[0x7eff8a9485a6] ) 0-: Assertion failed:
GF_MEM_TRAILER_MAGIC == *(uint32_t *)((char *)free_ptr + header->size)
[----snip----]

The whole output can be found here: https://pastebin.com/qTMmFxx0
gluster volume info here: https://pastebin.com/ENTWZ7j3

After umount + mount the transport endpoint is connected again - until
the next disconnect. A /core file gets generated. Maybe someone wants
to have a look at this file?
_________________

Hi Hu Bert,

Thanks for these logs, and report. 'Transport end point not connected' on a mount comes for 2 reasons.

1. When the brick (in case of replica all the bricks) having the file is not reachable, or are down. This gets to normal state when the bricks are restarted.
2. When the client process crashes/asserts. In this case, /dev/fuse wouldn't be connected to a process, but mount will still have a reference. This needs 'umount' and mount again to work.

We will see what is this issue and get back.

Regards,
Amar
 
______________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users


--
Amar Tumballi (amarts)
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux