Including devel
Pranith
On 06/14/2014 02:37 AM, David F. Robinson wrote:
Another update... The previous tests have shown that I can kill
gluster with even a moderate load to the storage system. One thing we
noticed with previous version of gluster was that the failure was
sensitive to TCP parameters. I have seen other postings on the web
noting similar behavior along with recommendations for TCP tuning
parameters.
When I use the default TCP parameters, the job dies during i/o and
gluster hangs during the heals with each of the bricks showing "crawl
in progress". This never clears and the i/o gets killed...
When I set the following parameters in /etc/sysctl.conf, the job runs
to completion without any issues and I don't get hung heal processes...
# Set by T. Young May 22 2014
net.core.netdev_max_backlog = 2500
net.ipv4.tcp_max_syn_backlog = 4096
net.core.rmem_max=8388608
net.core.wmem_max=8388608
net.core.rmem_default=65536
net.core.wmem_default=65536
net.ipv4.tcp_rmem=4096 87380 8388608
net.ipv4.tcp_wmem=4096 65536 8388608
net.ipv4.tcp_mem=8388608 8388608 8388608
net.ipv4.route.flush=1
I do still get many thousands of the following messages in the log files:
[2014-06-13 21:05:22.164073] I
[server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241947:
LOOKUP (null) (89371586-2e16-4623-bc9b-feb069b5c982) ==> (Stale file
handle)
[2014-06-13 21:05:22.165627] I
[server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241948:
LOOKUP (null) (8589b53e-f8b5-4bf9-9f54-f550e4e768c0) ==> (Stale file
handle)
[2014-06-13 21:05:22.166395] I
[server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241949:
LOOKUP (null) (2ad6bcce-4842-4c29-a319-39f276239b8b) ==> (Stale file
handle)
[2014-06-13 21:05:22.166989] I
[server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241950:
LOOKUP (null) (71b013f7-d508-41ee-8bc8-c8b328ff9f3a) ==> (Stale file
handle)
[2014-06-13 21:05:22.167653] I
[server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241951:
LOOKUP (null) (1d0c99a8-b2ab-402c-a8b2-33f55bcf6123) ==> (Stale file
handle)
[2014-06-13 21:05:22.168270] I
[server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241952:
LOOKUP (null) (c4f8b979-cbf3-4d6b-bcf9-6d5150521e19) ==> (Stale file
handle)
[2014-06-13 21:05:22.168797] I
[server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241953:
LOOKUP (null) (81da3d62-49fc-4465-9fb2-baa6a3278ce3) ==> (Stale file
handle)
[2014-06-13 21:05:22.169420] I
[server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241954:
LOOKUP (null) (dc9e9c2b-f801-452c-8ef7-009e600d23ca) ==> (Stale file
handle)
David
------ Original Message ------
From: "Justin Clift" <justin@xxxxxxxxxxx>
To: "David F. Robinson" <david.robinson@xxxxxxxxxxxxx>
Cc: "Ravishankar N" <ravishankar@xxxxxxxxxx>; "Pranith Kumar
Karampuri" <pkarampu@xxxxxxxxxx>; "Tom Young" <tom.young@xxxxxxxxxxxxx>
Sent: 6/13/2014 11:16:38 AM
Subject: Re: gluster 3.5.1 beta
Thanks, that's good news on the positive progress front. :)
+ Justin
On 13/06/2014, at 4:12 PM, David F. Robinson wrote:
FYI... The 3.5.1beta2 completed the large rsync... The last time I
tried this, the rsync died after about 3-TB; this time it completed
the 8TB transfer... The only messages that seem strange in the logs
after the rsync completed are:
[2014-06-13 15:09:30.080574] I
[server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 227104:
LOOKUP (null) (3cf20fd1-ce27-4fbd-aaa6-cd31aa6a13e5) ==> (Stale file
handle)
[2014-06-13 15:09:30.969218] I
[server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 227105:
LOOKUP (null) (b7353434-32a4-4674-9f62-f373d3d1d4f2) ==> (Stale file
handle)
[2014-06-13 15:10:32.814144] I
[server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 227114:
LOOKUP (null) (ad34cd69-0c90-4de9-9688-34199f6a3ae1) ==> (Stale file
handle)
David
------ Original Message ------
From: "Ravishankar N" <ravishankar@xxxxxxxxxx>
To: "Justin Clift" <justin@xxxxxxxxxxx>
Cc: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>; "Tom Young"
<tom.young@xxxxxxxxxxxxx>; "David F. Robinson"
<david.robinson@xxxxxxxxxxxxx>
Sent: 6/13/2014 12:22:58 AM
Subject: Re: gluster 3.5.1 beta
On 06/13/2014 04:03 AM, Justin Clift wrote:
Testing feedback for 3.5.1 beta2 (was in a different email chain).
Some strange looking messages in the logs (scroll down for the
better details):
[2014-06-12 22:09:54.482481] E
[index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base
index is not createdunder index/base_indices_holder
This would be fixed once http://review.gluster.org/#/c/7897/ gets
accepted.
and:
[2014-06-12 21:49:54.326014] E
[afr-self-heald.c:1189:afr_crawl_build_start_loc]
0-Software-replicate-1: lookup failed on index dir on
Software-client-2 - (Stale file handle)
We still need to root cause this...
+ Justin
Begin forwarded message:
From: "David F. Robinson" <david.robinson@xxxxxxxxxxxxx>
<snip>
FYI. I am retesting the gluster 3.5.1-beta2 using the same
approach as before. I gluster mounted my homegfs partition to a
workstation and am doing an rsync of roughly 8TB of data. The
3.5.1 version died after transferring roughly 3-4TB with the
errors show in the previous emails. It seems to be doing fine and
has already transferred 2.5TB. The log messages that seemed
strage are:
[2014-06-12 22:01:59.872521] W [dict.c:1055:data_to_str]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec)
[0x7feb9f28f8ec]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
[0x7feb9f293fcd]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x200)
[0x7feb9f293e80]))) 0-dict: data is NULL
[2014-06-12 22:01:59.872540] W [dict.c:1055:data_to_str]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec)
[0x7feb9f28f8ec]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
[0x7feb9f293fcd]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x20b)
[0x7feb9f293e8b]))) 0-dict: data is NULL
[2014-06-12 22:01:59.872545] E
[name.c:147:client_fill_address_family] 0-glusterfs:
transport.address-family not specified. Could not guess default
value from (remote-host:(null) or
transport.unix.connect-path:(null)) options
[2014-06-12 22:02:02.872835] W [dict.c:1055:data_to_str]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec)
[0x7feb9f28f8ec]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
[0x7feb9f293fcd]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x200)
[0x7feb9f293e80]))) 0-dict: data is NULL
[2014-06-12 22:02:02.872855] W [dict.c:1055:data_to_str]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec)
[0x7feb9f28f8ec]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
[0x7feb9f293fcd]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x20b)
[0x7feb9f293e8b]))) 0-dict: data is NULL
[2014-06-12 22:02:02.872860] E
[name.c:147:client_fill_address_family] 0-glusterfs:
transport.address-family not specified. Could not guess default
value from (remote-host:(null) or
transport.unix.connect-path:(null)) options
[2014-06-12 22:02:05.873151] W [dict.c:1055:data_to_str]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec)
[0x7feb9f28f8ec]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
[0x7feb9f293fcd]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x200)
[0x7feb9f293e80]))) 0-dict: data is NULL
[2014-06-12 22:02:05.873171] W [dict.c:1055:data_to_str]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec)
[0x7feb9f28f8ec]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
[0x7feb9f293fcd]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x20b)
[0x7feb9f293e8b]))) 0-dict: data is NULL
[2014-06-12 22:02:05.873176] E
[name.c:147:client_fill_address_family] 0-glusterfs:
transport.address-family not specified. Could not guess default
value from (remote-host:(null) or
transport.unix.connect-path:(null)) options
[2014-06-12 22:02:08.873483] W [dict.c:1055:data_to_str]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec)
[0x7feb9f28f8ec]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
[0x7feb9f293fcd]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x200)
[0x7feb9f293e80]))) 0-dict: data is NULL
[2014-06-12 22:02:08.873504] W [dict.c:1055:data_to_str]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec)
[0x7feb9f28f8ec]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
[0x7feb9f293fcd]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x20b)
[0x7feb9f293e8b]))) 0-dict: data is NULL
[2014-06-12 22:02:08.873509] E
[name.c:147:client_fill_address_family] 0-glusterfs:
transport.address-family not specified. Could not guess default
value from (remote-host:(null) or
transport.unix.connect-path:(null)) options
[2014-06-12 22:02:11.873806] W [dict.c:1055:data_to_str]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec)
[0x7feb9f28f8ec]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
[0x7feb9f293fcd]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x200)
[0x7feb9f293e80]))) 0-dict: data is NULL
[2014-06-12 22:02:11.873827] W [dict.c:1055:data_to_str]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec)
[0x7feb9f28f8ec]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
[0x7feb9f293fcd]
(-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x20b)
[0x7feb9f293e8b]))) 0-dict: data is NULL
[2014-06-12 22:02:11.873832] E
[name.c:147:client_fill_address_family] 0-glusterfs:
transport.address-family not specified. Could not guess default
value from (remote-host:(null) or
transport.unix.connect-path:(null)) options
[2014-06-12 22:02:46.073341] I [socket.c:3561:socket_init]
0-glusterfs: SSL support is NOT enabled
[2014-06-12 22:02:46.073369] I [socket.c:3576:socket_init]
0-glusterfs: using system polling thread
[2014-06-12 21:29:54.225860] E
[index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base
index is not createdunder index/base_indices_holder
[2014-06-12 21:39:54.276236] E
[index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base
index is not createdunder index/base_indices_holder
[2014-06-12 21:49:54.325532] E
[index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base
index is not createdunder index/base_indices_holder
[2014-06-12 21:59:54.374955] E
[index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base
index is not createdunder index/base_indices_holder
[2014-06-12 22:09:54.482350] E
[index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base
index is not createdunder index/base_indices_holder
[2014-06-12 22:09:54.482481] E
[index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base
index is not createdunder index/base_indices_holder
I am also still seeing these messages (very strange because
there are no files on on the Software volume. That volume is
completely empty...):
[2014-06-12 21:49:54.326014] E
[afr-self-heald.c:1189:afr_crawl_build_start_loc]
0-Software-replicate-1: lookup failed on index dir on
Software-client-2 - (Stale file handle)
[2014-06-12 21:49:54.327077] E
[afr-self-heald.c:1189:afr_crawl_build_start_loc]
0-Software-replicate-0: lookup failed on index dir on
Software-client-0 - (Stale file handle)
[2014-06-12 21:59:54.373724] E
[afr-self-heald.c:1189:afr_crawl_build_start_loc]
0-Source-replicate-0: lookup failed on index dir on
Source-client-0 - (Stale file handle)
[2014-06-12 21:59:54.373950] E
[afr-self-heald.c:1189:afr_crawl_build_start_loc]
0-Source-replicate-1: lookup failed on index dir on
Source-client-2 - (Stale file handle)
[2014-06-12 21:59:54.375302] E
[afr-self-heald.c:1189:afr_crawl_build_start_loc]
0-Software-replicate-1: lookup failed on index dir on
Software-client-2 - (Stale file handle)
[2014-06-12 21:59:54.376673] E
[afr-self-heald.c:1189:afr_crawl_build_start_loc]
0-Software-replicate-0: lookup failed on index dir on
Software-client-0 - (Stale file handle)
[2014-06-12 22:09:54.424471] E
[afr-self-heald.c:1189:afr_crawl_build_start_loc]
0-Source-replicate-0: lookup failed on index dir on
Source-client-0 - (Stale file handle)
[2014-06-12 22:09:54.424667] E
[afr-self-heald.c:1189:afr_crawl_build_start_loc]
0-Source-replicate-1: lookup failed on index dir on
Source-client-2 - (Stale file handle)
[2014-06-12 22:09:54.482812] E
[afr-self-heald.c:1189:afr_crawl_build_start_loc]
0-Software-replicate-1: lookup failed on index dir on
Software-client-2 - (Stale file handle)
[2014-06-12 22:09:54.482910] E
[afr-self-heald.c:1189:afr_crawl_build_start_loc]
0-Software-replicate-0: lookup failed on index dir on
Software-client-0 - (Stale file handle)
David
On 12/06/2014, at 3:16 AM, David F. Robinson wrote:
Roger that. Thanks for the feedback. For testing, this approach
would work fine. If we put gluster into production, it would not
be optimal. Taking the entire data storage offline for the
upgrade would be difficult given the number of machines and the
cluster jobs that are always running.
If you get the rolling upgrade working and need someone to test,
let me know. Happy to test and provide feedback.
Thanks...
David (Sent from mobile)
===============================
David F. Robinson, Ph.D.
President - Corvid Technologies
704.799.6944 x101 [office]
704.252.1310 [cell]
704.799.7974 [fax]
David.Robinson@xxxxxxxxxxxxx
http://www.corvidtechnologies.com
--
GlusterFS - http://www.gluster.org
An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.
My personal twitter: twitter.com/realjustinclift
--
GlusterFS - http://www.gluster.org
An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.
My personal twitter: twitter.com/realjustinclift
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-devel