hi,
Responses inline.
PS: You are chalkogen_oxygen?
Pranith
On 01/20/2015 05:34 PM, A Ghoshal
wrote:
Hello,
I am using the
following
replicated volume:
root@serv0:~> gluster v info
replicated_vol
Volume Name: replicated_vol
Type: Replicate
Volume ID:
26d111e3-7e4c-479e-9355-91635ab7f1c2
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1:
serv0:/mnt/bricks/replicated_vol/brick
Brick2:
serv1:/mnt/bricks/replicated_vol/brick
Options Reconfigured:
diagnostics.client-log-level:
INFO
network.ping-timeout: 10
nfs.enable-ino32: on
cluster.self-heal-daemon: on
nfs.disable: off
replicated_vol
is mounted
at /mnt/replicated_vol on both serv0 and serv1. If I do the
following on
serv0:
root@serv0:~>echo
"cranberries"
> /mnt/replicated_vol/testfile
root@serv0:~>echo
"tangerines"
>> /mnt/replicated_vol/testfile
And then I
check for
the state of the replicas in the bricks, then I find that
root@serv0:~>cat
/mnt/bricks/replicated_vol/brick/testfile
cranberries
tangerines
root@serv0:~>
root@serv1:~>cat
/mnt/bricks/replicated_vol/brick/testfile
root@serv1:~>
As may be seen,
the
replica on serv1 is blank, when I write into testfile from serv0
(even
though the file is created on both bricks). Interestingly, if I
write something
to the file at serv1, then the two replicas become identical.
root@serv1:~>echo
"artichokes"
>> /mnt/replicated_vol/testfile
root@serv1:~>cat
/mnt/bricks/replicated_vol/brick/testfile
cranberries
tangerines
artichokes
root@serv1:~>
root@serv0:~>cat
/mnt/bricks/replicated_vol/brick/testfile
cranberries
tangerines
artichokes
root@serv0:~>
So, I dabbled
into the
logs a little bit, after upping the diagnostic level, and this
is what
I saw:
When I write on serv0
(bad
case):
[2015-01-20 09:21:52.197704] T
[fuse-bridge.c:546:fuse_lookup_resume]
0-glusterfs-fuse: 53027: LOOKUP
/testfl(f0a76987-8a42-47a2-b027-a823254b736b)
[2015-01-20 09:21:52.197959] D
[afr-common.c:131:afr_lookup_xattr_req_prepare]
0-replicated_vol-replicate-0: /testfl: failed to get the gfid
from dict
[2015-01-20 09:21:52.198006] T
[rpc-clnt.c:1302:rpc_clnt_record]
0-replicated_vol-client-0: Auth Info: pid: 28151, uid: 0, gid:
0, owner:
0000000000000000
[2015-01-20 09:21:52.198024] T
[rpc-clnt.c:1182:rpc_clnt_record_build_header]
0-rpc-clnt: Request fraglen 456, payload: 360, rpc hdr: 96
[2015-01-20 09:21:52.198108] T
[rpc-clnt.c:1499:rpc_clnt_submit]
0-rpc-clnt: submitted request (XID: 0x78163x Program: GlusterFS
3.3, ProgVers:
330, Proc: 27) to rpc-transport (replicated_vol-client-0)
[2015-01-20 09:21:52.198565] T
[rpc-clnt.c:669:rpc_clnt_reply_init]
0-replicated_vol-client-0: received rpc message (RPC XID:
0x78163x Program:
GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport
(replicated_vol-client-0)
[2015-01-20 09:21:52.198640] D
[afr-self-heal-common.c:138:afr_sh_print_pending_matrix]
0-replicated_vol-replicate-0: pending_matrix: [ 0 3 ]
[2015-01-20 09:21:52.198669] D
[afr-self-heal-common.c:138:afr_sh_print_pending_matrix]
0-replicated_vol-replicate-0: pending_matrix: [ 0 0 ]
[2015-01-20 09:21:52.198681] D
[afr-self-heal-common.c:887:afr_mark_sources]
0-replicated_vol-replicate-0: Number of sources: 1
[2015-01-20 09:21:52.198694] D
[afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type]
0-replicated_vol-replicate-0: returning read_child: 0
[2015-01-20 09:21:52.198705] D
[afr-common.c:1380:afr_lookup_select_read_child]
0-replicated_vol-replicate-0: Source selected as 0 for /testfl
[2015-01-20 09:21:52.198720] D
[afr-common.c:1117:afr_lookup_build_response_params]
0-replicated_vol-replicate-0: Building lookup response from 0
[2015-01-20 09:21:52.198732] D
[afr-common.c:1732:afr_lookup_perform_self_heal]
0-replicated_vol-replicate-0: Only 1 child up - do not attempt
to detect
self heal
When I write on serv1
(good
case):
[2015-01-20 09:37:49.151506] T
[fuse-bridge.c:546:fuse_lookup_resume]
0-glusterfs-fuse: 31212: LOOKUP
/testfl(f0a76987-8a42-47a2-b027-a823254b736b)
[2015-01-20 09:37:49.151683] D
[afr-common.c:131:afr_lookup_xattr_req_prepare]
0-replicated_vol-replicate-0: /testfl: failed to get the gfid
from dict
[2015-01-20 09:37:49.151726] T
[rpc-clnt.c:1302:rpc_clnt_record]
0-replicated_vol-client-0: Auth Info: pid: 7599, uid: 0, gid: 0,
owner:
0000000000000000
[2015-01-20 09:37:49.151744] T
[rpc-clnt.c:1182:rpc_clnt_record_build_header]
0-rpc-clnt: Request fraglen 456, payload: 360, rpc hdr: 96
[2015-01-20 09:37:49.151780] T
[rpc-clnt.c:1499:rpc_clnt_submit]
0-rpc-clnt: submitted request (XID: 0x39620x Program: GlusterFS
3.3, ProgVers:
330, Proc: 27) to rpc-transport (replicated_vol-client-0)
[2015-01-20 09:37:49.151810] T
[rpc-clnt.c:1302:rpc_clnt_record]
0-replicated_vol-client-1: Auth Info: pid: 7599, uid: 0, gid: 0,
owner:
0000000000000000
[2015-01-20 09:37:49.151824] T
[rpc-clnt.c:1182:rpc_clnt_record_build_header]
0-rpc-clnt: Request fraglen 456, payload: 360, rpc hdr: 96
[2015-01-20 09:37:49.151889] T
[rpc-clnt.c:1499:rpc_clnt_submit]
0-rpc-clnt: submitted request (XID: 0x39563x Program: GlusterFS
3.3, ProgVers:
330, Proc: 27) to rpc-transport (replicated_vol-client-1)
[2015-01-20 09:37:49.152239] T
[rpc-clnt.c:669:rpc_clnt_reply_init]
0-replicated_vol-client-1: received rpc message (RPC XID:
0x39563x Program:
GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport
(replicated_vol-client-1)
[2015-01-20 09:37:49.152484] T
[rpc-clnt.c:669:rpc_clnt_reply_init]
0-replicated_vol-client-0: received rpc message (RPC XID:
0x39620x Program:
GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport
(replicated_vol-client-0)
[2015-01-20 09:37:49.152582] D
[afr-self-heal-common.c:138:afr_sh_print_pending_matrix]
0-replicated_vol-replicate-0: pending_matrix: [ 0 3 ]
[2015-01-20 09:37:49.152596] D
[afr-self-heal-common.c:138:afr_sh_print_pending_matrix]
0-replicated_vol-replicate-0: pending_matrix: [ 0 0 ]
[2015-01-20 09:37:49.152621] D
[afr-self-heal-common.c:887:afr_mark_sources]
0-replicated_vol-replicate-0: Number of sources: 1
[2015-01-20 09:37:49.152633] D
[afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type]
0-replicated_vol-replicate-0: returning read_child: 0
[2015-01-20 09:37:49.152644] D
[afr-common.c:1380:afr_lookup_select_read_child]
0-replicated_vol-replicate-0: Source selected as 0 for /testfl
[2015-01-20 09:37:49.152657] D
[afr-common.c:1117:afr_lookup_build_response_params]
0-replicated_vol-replicate-0: Building lookup response from 0
We see that
when you
write on serv1, the RPC request is sent to both
replicated_vol-client-0
and replicated_vol-client-1, while when we write on serv0, the
request
is sent only to replicated_vol-client-0, and the FUse client is
unaware
of the presence of client-1 in the latter case.
I checked a bit
more
in the logs. When I turn on my trace, I found many instances of
these logs
on serv0 but NOT on serv1:
[2015-01-20 09:21:15.520784] T
[fuse-bridge.c:681:fuse_attr_cbk]
0-glusterfs-fuse: 53011: LOOKUP() / => 1
[2015-01-20 09:21:17.683088] T
[rpc-clnt.c:422:rpc_clnt_reconnect]
0-replicated_vol-client-1: attempting reconnect
[2015-01-20 09:21:17.683159] D
[name.c:155:client_fill_address_family]
0-replicated_vol-client-1: address-family not specified,
guessing it to
be inet from (remote-host: serv1)
[2015-01-20 09:21:17.683178] T
[name.c:225:af_inet_client_get_remote_sockaddr]
0-replicated_vol-client-1: option remote-port missing in volume
replicated_vol-client-1.
Defaulting to 24007
[2015-01-20 09:21:17.683191] T
[common-utils.c:188:gf_resolve_ip6]
0-resolver: flushing DNS cache
[2015-01-20 09:21:17.683202] T
[common-utils.c:195:gf_resolve_ip6]
0-resolver: DNS cache not present, freshly probing hostname:
serv1
[2015-01-20 09:21:17.683814] D
[common-utils.c:237:gf_resolve_ip6]
0-resolver: returning ip-192.168.24.81 (port-24007) for
hostname: serv1
and port: 24007
[2015-01-20 09:21:17.684139] D
[common-utils.c:257:gf_resolve_ip6]
0-resolver: next DNS query will return: ip-192.168.24.81
port-24007
[2015-01-20 09:21:17.684164] T
[socket.c:731:__socket_nodelay]
0-replicated_vol-client-1: NODELAY enabled for socket 10
[2015-01-20 09:21:17.684177] T
[socket.c:790:__socket_keepalive]
0-replicated_vol-client-1: Keep-alive enabled for socket 10,
interval 2,
idle: 20
[2015-01-20 09:21:17.684236] W
[common-utils.c:2247:gf_get_reserved_ports]
0-glusterfs: could not open the file
/proc/sys/net/ipv4/ip_local_reserved_ports
for getting reserved ports info (No such file or directory)
[2015-01-20 09:21:17.684253] W
[common-utils.c:2280:gf_process_reserved_ports]
0-glusterfs: Not able to get reserved ports, hence there is a
possibility
that glusterfs may consume reserved port
Logs above suggest that mount process couldn't assign a reserved
port because it couldn't find the file /proc/sys/net/ipv4/ip_local_reserved_ports
I guess reboot of the machine fixed it. Wonder why it was not
found in the first place.
Pranith.
[2015-01-20
09:21:17.684660] D [socket.c:605:__socket_shutdown]
0-replicated_vol-client-1: shutdown() returned -1. Transport
endpoint is
not connected
[2015-01-20 09:21:17.684699] T
[rpc-clnt.c:519:rpc_clnt_connection_cleanup]
0-replicated_vol-client-1: cleaning up state in transport object
0x68a630
[2015-01-20 09:21:17.684731] D
[socket.c:486:__socket_rwv]
0-replicated_vol-client-1: EOF on socket
[2015-01-20 09:21:17.684750] W
[socket.c:514:__socket_rwv]
0-replicated_vol-client-1: readv failed (No data available)
[2015-01-20 09:21:17.684766] D
[socket.c:1962:__socket_proto_state_machine]
0-replicated_vol-client-1: reading from socket failed. Error (No
data available),
peer (192.168.24.81:49198)
I could not
find a 'remote-port'
option in /var/lib/glusterd on either peer. Could somebody tell
me where
this configuration is looked up from? Also, sometime later, I
rebooted
serv0 and that seemed to solve the problem. However, stop+start
of replicated_vol
and restart of /etc/init.d/glusterd did NOT solve the problem.
Ignore that log. If no port is given in that volfile, it picks 24007
as the port, which is the default port where glusterd 'listens'
Any help on
this matter
will be greatly appreciated as I need to provide robustness
assurances
for our setup.
Thanks a lot,
Anirban
P.s. Additional
details:
glusterfs
version:
3.4.2
Linux kernel
version:
2.6.34
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
|