Re: peer probe failures

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I’m having a similar issue, were you able to solve it?

Thanks.

 

 

 

Hey all,

 

I've got a strange problem going on here. I've installed glusterfs-server

on ubuntu 16.04:

glusterfs-client/xenial,now 3.7.6-1ubuntu1 amd64 [installed,automatic]

glusterfs-common/xenial,now 3.7.6-1ubuntu1 amd64 [installed,automatic]

glusterfs-server/xenial,now 3.7.6-1ubuntu1 amd64 [installed]

 

I can successfully probe another peer at this point. Then, after installing

kubernetes via kargo, peer probing begins failing with a timeout. I've

tried stopping all kubernetes related services, and flushing all iptables

rules, however I don't see any packets leaving any interface when

attempting to peer probe.

 

from cli.log:

[2017-04-03 22:20:24.704900] I [MSGID: 101190]

[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread

with index 1

[2017-04-03 22:20:24.704973] T [cli.c:273:cli_rpc_notify] 0-glusterfs: got

RPC_CLNT_CONNECT

[2017-04-03 22:20:24.705001] T [cli-quotad-client.c:94:cli_quotad_notify]

0-glusterfs: got RPC_CLNT_CONNECT

[2017-04-03 22:20:24.705014] I [socket.c:2355:socket_event_handler]

0-transport: disconnecting now

[2017-04-03 22:20:24.705204] T [rpc-clnt.c:1404:rpc_clnt_record]

0-glusterfs: Auth Info: pid: 0, uid: 0, gid: 0, owner:

[2017-04-03 22:20:24.705256] T

[rpc-clnt.c:1261:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen

156, payload: 92, rpc hdr: 64

[2017-04-03 22:20:24.705662] T [socket.c:2879:socket_connect] (-->

/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f012fd21953]

(--> /usr/lib/x86_64-linux-gnu

/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x84)[0x7f012f69add4] (-->

/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x55)[0x7f012f697af5]

(--> /usr/lib/x8

6_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f012f698338] (-->

/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f012f6945b3]

))))) 0-glusterfs: connect

() called on transport already connected

[2017-04-03 22:20:24.705680] D

[rpc-clnt-ping.c:98:rpc_clnt_remove_ping_timer_locked] (-->

/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f012fd21953]

(--> /

usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x84)[0x7f012f69add4]

(-->

/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x55)[0x7f012f

697af5] (-->

/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f012f698338]

(-->

/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f012f6945b3]

)))

)) 0-: /var/run/gluster/quotad.socket: ping timer event already removed

[2017-04-03 22:20:24.705710] T [cli-quotad-client.c:100:cli_quotad_notify]

0-glusterfs: got RPC_CLNT_DISCONNECT

[2017-04-03 22:20:24.705718] T [rpc-clnt.c:1598:rpc_clnt_submit]

0-rpc-clnt: submitted request (XID: 0x1 Program: Gluster CLI, ProgVers: 2,

Proc: 1) to rpc-transport (glusterfs)

[2017-04-03 22:20:24.705739] D [rpc-clnt-ping.c:281:rpc_clnt_start_ping]

0-glusterfs: ping timeout is 0, returning

[2017-04-03 22:20:24.705723] D [MSGID: 0]

[event-epoll.c:591:event_dispatch_epoll_handler] 0-epoll: generation bumped

on idx=1 from gen=1 to slot->gen=2, fd=7, slot->fd=7

[2017-04-03 22:20:27.614881] T [rpc-clnt.c:418:rpc_clnt_reconnect]

0-glusterfs: attempting reconnect

[2017-04-03 22:20:27.615151] T [socket.c:2879:socket_connect] (-->

/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f012fd21953]

(-->

/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/rpc-transport/socket.so(+0x6c1b)[0x7f012a697c1b]

(-->

/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_reconnect+0xb9)[0x7f012f695999]

(-->

/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_timer_proc+0xfc)[0x7f012fd3d70c]

(--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f012f0b86ba] )))))

0-glusterfs: connect () called on transport already connected

 

it then repeats the following:

[2017-04-03 22:20:27.615177] T [rpc-clnt.c:418:rpc_clnt_reconnect]

0-glusterfs: attempting reconnect

[2017-04-03 22:20:27.615188] T [socket.c:2887:socket_connect] 0-glusterfs:

connecting 0x25d3550, state=0 gen=0 sock=-1

[2017-04-03 22:20:27.615200] T

[name.c:295:af_unix_client_get_remote_sockaddr] 0-glusterfs: using

connect-path /var/run/gluster/quotad.socket

[2017-04-03 22:20:27.615218] T [name.c:111:af_unix_client_bind]

0-glusterfs: bind-path not specified for unix socket, letting connect to

assign default value

[2017-04-03 22:20:27.615329] T [cli-quotad-client.c:94:cli_quotad_notify]

0-glusterfs: got RPC_CLNT_CONNECT

[2017-04-03 22:20:27.615355] I [socket.c:2355:socket_event_handler]

0-transport: disconnecting now

[2017-04-03 22:20:27.615567] D

[rpc-clnt-ping.c:98:rpc_clnt_remove_ping_timer_locked] (-->

/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f012fd21953]

(-->

/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x84)[0x7f012f69add4]

(-->

/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x55)[0x7f012f697af5]

(-->

/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f012f698338]

(-->

/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f012f6945b3]

))))) 0-: /var/run/gluster/quotad.socket: ping timer event already removed

[2017-04-03 22:20:27.615598] T [cli-quotad-client.c:100:cli_quotad_notify]

0-glusterfs: got RPC_CLNT_DISCONNECT

[2017-04-03 22:20:27.615612] D [MSGID: 0]

[event-epoll.c:591:event_dispatch_epoll_handler] 0-epoll: generation bumped

on idx=1 from gen=4 to slot->gen=5, fd=7, slot->fd=7

[2017-04-03 22:20:30.615864] T [rpc-clnt.c:418:rpc_clnt_reconnect]

0-glusterfs: attempting reconnect

[2017-04-03 22:20:30.616113] T [socket.c:2879:socket_connect] (-->

/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f012fd21953]

(-->

/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/rpc-transport/socket.so(+0x6c1b)[0x7f012a697c1b]

(-->

/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_reconnect+0xb9)[0x7f012f695999]

(-->

/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_timer_proc+0xfc)[0x7f012fd3d70c]

(--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f012f0b86ba] )))))

0-glusterfs: connect () called on transport already connected

 

and from etc-glusterfs-glusterd.vol.log i see the probe request and then:

[2017-04-03 22:20:24.707258] D [MSGID: 0]

[glusterd-peer-utils.c:133:glusterd_peerinfo_find_by_hostname]

0-management: Unable to find friend: 10.1.0.7

[2017-04-03 22:20:24.707260] D [MSGID: 0]

[glusterd-peer-utils.c:247:glusterd_peerinfo_find] 0-management: Unable to

find hostname: 10.1.0.7

[2017-04-03 22:20:24.707293] I [MSGID: 106129]

[glusterd-handler.c:3600:glusterd_probe_begin] 0-glusterd: Unable to find

peerinfo for host: 10.1.0.7 (24007)

[2017-04-03 22:20:24.707314] D [MSGID: 0]

[glusterd-peer-utils.c:458:glusterd_peer_hostname_new] 0-glusterd:

Returning 0

[2017-04-03 22:20:24.707329] D [MSGID: 0]

[glusterd-utils.c:5974:glusterd_sm_tr_log_init] 0-glusterd: returning 0

[2017-04-03 22:20:24.707353] D [MSGID: 0]

[glusterd-store.c:3802:glusterd_store_create_peer_dir] 0-glusterd:

Returning with 0

[2017-04-03 22:20:24.728550] D [MSGID: 0] [store.c:421:gf_store_handle_new]

0-: Returning 0

[2017-04-03 22:20:24.728679] D [MSGID: 0] [store.c:375:gf_store_save_value]

0-management: returning: 0

[2017-04-03 22:20:24.728720] D [logging.c:1952:_gf_msg_internal]

0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About

to flush least recently used log message to disk

The message "D [MSGID: 0] [store.c:375:gf_store_save_value] 0-management:

returning: 0" repeated 2 times between [2017-04-03 22:20:24.728679] and

[2017-04-03 22:20:24.728718]

[2017-04-03 22:20:24.728720] D [MSGID: 0]

[glusterd-store.c:3931:glusterd_store_peer_write] 0-glusterd: Returning

with 0

[2017-04-03 22:20:24.786973] D [MSGID: 0]

[glusterd-store.c:3957:glusterd_store_perform_peer_store] 0-glusterd:

Returning 0

[2017-04-03 22:20:24.787007] D [MSGID: 0]

[glusterd-store.c:3978:glusterd_store_peerinfo] 0-glusterd: Returning with 0

[2017-04-03 22:20:24.787046] D [MSGID: 0]

[glusterd-handler.c:3391:glusterd_transport_inet_options_build] 0-glusterd:

Returning 0

[2017-04-03 22:20:24.787077] I [rpc-clnt.c:984:rpc_clnt_connection_init]

0-management: setting frame-timeout to 600

[2017-04-03 22:20:24.787090] D [rpc-clnt.c:996:rpc_clnt_connection_init]

0-management: setting ping-timeout to 30

[2017-04-03 22:20:24.787106] D [rpc-transport.c:288:rpc_transport_load]

0-rpc-transport: attempt to load file

/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/rpc-transport/socket.so

[2017-04-03 22:20:24.787147] T [MSGID: 0]

[options.c:91:xlator_option_validate_int] 0-management: no range check

required for 'option transport.tcp-user-timeout -1'

[2017-04-03 22:20:24.787166] T [MSGID: 0]

[options.c:91:xlator_option_validate_int] 0-management: no range check

required for 'option transport.socket.keepalive-time 10'

[2017-04-03 22:20:24.787182] T [MSGID: 0]

[options.c:91:xlator_option_validate_int] 0-management: no range check

required for 'option transport.socket.keepalive-interval 2'

[2017-04-03 22:20:24.787203] T [MSGID: 0]

[options.c:91:xlator_option_validate_int] 0-management: no range check

required for 'option remote-port 24007'

[2017-04-03 22:20:24.787221] D [socket.c:3845:socket_init] 0-management:

Configued transport.tcp-user-timeout=-1

[2017-04-03 22:20:24.787233] D [socket.c:3928:socket_init] 0-management:

SSL support on the I/O path is NOT enabled

[2017-04-03 22:20:24.787242] D [socket.c:3931:socket_init] 0-management:

SSL support for glusterd is NOT enabled

[2017-04-03 22:20:24.787251] D [socket.c:3948:socket_init] 0-management:

using system polling thread

[2017-04-03 22:20:24.787263] T [rpc-clnt.c:418:rpc_clnt_reconnect]

0-management: attempting reconnect

[2017-04-03 22:20:24.787273] T [socket.c:2887:socket_connect] 0-management:

connecting 0x7ff7fc005640, state=0 gen=0 sock=-1

[2017-04-03 22:20:24.787288] T [MSGID: 0]

[common-utils.c:290:gf_resolve_ip6] 0-resolver: DNS cache not present,

freshly probing hostname: 10.1.0.7

[2017-04-03 22:20:24.787843] D [MSGID: 0]

[common-utils.c:333:gf_resolve_ip6] 0-resolver: returning ip-10.1.0.7

(port-24007) for hostname: 10.1.0.7 and port: 24007

[2017-04-03 22:20:24.787863] D [socket.c:2833:socket_fix_ssl_opts]

0-management: disabling SSL for portmapper connection

[2017-04-03 22:20:24.787883] T [socket.c:805:__socket_nodelay]

0-management: NODELAY enabled for socket 8

[2017-04-03 22:20:24.787899] W [socket.c:869:__socket_keepalive] 0-socket:

failed to set TCP_USER_TIMEOUT -1000 on socket 8, Invalid argument

[2017-04-03 22:20:24.787910] E [socket.c:2965:socket_connect] 0-management:

Failed to set keep-alive: Invalid argument

[2017-04-03 22:20:24.788092] D [MSGID: 0]

[common-utils.c:2887:gf_ports_reserved] 0-glusterfs: lower: 30000, higher:

32767

[2017-04-03 22:21:52.911012] D [logging.c:1828:gf_log_flush_timeout_cbk]

0-logging-infra: Log timer timed out. About to flush outstanding messages

if present

[2017-04-03 22:21:52.911131] D [logging.c:1790:__gf_log_inject_timer_event]

0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5

[2017-04-03 22:22:24.707002] D [socket.c:558:__socket_rwv]

0-socket.management: EOF on socket

[2017-04-03 22:22:24.707032] D [socket.c:2355:socket_event_handler]

0-transport: disconnecting now

 

I've debugged this to the best of my ability but haven't made any progress

narrowing it down, nor has google turned up anything useful.

 

Thanks for the help!

 

Ken

 

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux