Bug 1043009 Submitted
On Thursday, December 12, 2013 11:46:03 PM Anand Avati wrote: > Please provide the full client and server logs (in a bug report). The > snippets give some hints, but are not very meaningful without the full > context/history since mount time (they have after-the-fact symptoms, but > not the part which show the reason why disconnects happened). > > Even before looking into the full logs here are some quick observations: > > - write-behind-window-size = 1024MB seems *excessively* high. Please set > this to 1MB (default) and check if the stability improves. > > - I see RDMA is enabled on the volume. Are you mounting clients through > RDMA? If so, for the purpose of diagnostics can you mount through TCP and > check the stability improves? If you are using RDMA with such a high > write-behind-window-size, spurious ping-timeouts are an almost certainty > during heavy writes. The RDMA driver has limited flow control, and setting > such a high window-size can easily congest all the RDMA buffers resulting > in spurious ping-timeouts and disconnections. > > Avati > > On Thu, Dec 12, 2013 at 5:03 PM, harry mangalam <harry.mangalam@xxxxxxx>wrote: > > Hi All, > > > > (Gluster Volume Details at bottom) > > > > > > > > I've posted some of this previously, but even after various upgrades, > > attempted fixes, etc, it remains a problem. > > > > > > > > > > > > Short version: Our gluster fs (~340TB) provides scratch space for a > > ~5000core academic compute cluster. > > > > Much of our load is streaming IO, doing a lot of genomics work, and that > > is the load under which we saw this latest failure. > > > > Under heavy batch load, especially array jobs, where there might be > > several 64core nodes doing I/O on the 4servers/8bricks, we often get job > > failures that have the following profile: > > > > > > > > Client POV: > > > > Here is a sampling of the client logs (/var/log/glusterfs/gl.log) for all > > compute nodes that indicated interaction with the user's files > > > > <http://pastie.org/8548781> > > > > > > > > Here are some client Info logs that seem fairly serious: > > > > <http://pastie.org/8548785> > > > > > > > > The errors that referenced this user were gathered from all the nodes that > > were running his code (in compute*) and agglomerated with: > > > > > > > > cut -f2,3 -d']' compute* |cut -f1 -dP | sort | uniq -c | sort -gr > > > > > > > > and placed here to show the profile of errors that his run generated. > > > > <http://pastie.org/8548796> > > > > > > > > so 71 of them were: > > > > W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gl-client-7: remote > > operation failed: Transport endpoint is not connected. > > > > etc > > > > > > > > We've seen this before and previously discounted it bc it seems to have > > been related to the problem of spurious NFS-related bugs, but now I'm > > wondering whether it's a real problem. > > > > Also the 'remote operation failed: Stale file handle. ' warnings. > > > > > > > > There were no Errors logged per se, tho some of the W's looked fairly > > nasty, like the 'dht_layout_dir_mismatch' > > > > > > > > From the server side, however, during the same period, there were: > > > > 0 Warnings about this user's files > > > > 0 Errors > > > > 458 Info lines > > > > of which only 1 line was not a 'cleanup' line like this: > > > > --- > > > > 10.2.7.11:[2013-12-12 21:22:01.064289] I > > [server-helpers.c:460:do_fd_cleanup] 0-gl-server: fd cleanup on > > /path/to/file > > > > --- > > > > it was: > > > > --- > > > > 10.2.7.14:[2013-12-12 21:00:35.209015] I > > [server-rpc-fops.c:898:_gf_server_log_setxattr_failure] 0-gl-server: > > 113697332: SETXATTR /bio/tdlong/RNAseqIII/ckpt.1084030 > > (c9488341-c063-4175-8492-75e2e282f690) ==> trusted.glusterfs.dht > > > > --- > > > > > > > > We're losing about 10% of these kinds of array jobs bc of this, which is > > just not supportable. > > > > > > > > > > > > > > > > Gluster details > > > > > > > > servers and clients running gluster 3.4.0-8.el6 over QDR IB, IPoIB, thru 2 > > Mellanox, 1 Voltaire switches, Mellanox cards, CentOS 6.4 > > > > > > > > $ gluster volume info > > > > Volume Name: gl > > > > Type: Distribute > > > > Volume ID: 21f480f7-fc5a-4fd8-a084-3964634a9332 > > > > Status: Started > > > > Number of Bricks: 8 > > > > Transport-type: tcp,rdma > > > > Bricks: > > > > Brick1: bs2:/raid1 > > > > Brick2: bs2:/raid2 > > > > Brick3: bs3:/raid1 > > > > Brick4: bs3:/raid2 > > > > Brick5: bs4:/raid1 > > > > Brick6: bs4:/raid2 > > > > Brick7: bs1:/raid1 > > > > Brick8: bs1:/raid2 > > > > Options Reconfigured: > > > > performance.write-behind-window-size: 1024MB > > > > performance.flush-behind: on > > > > performance.cache-size: 268435456 > > > > nfs.disable: on > > > > performance.io-cache: on > > > > performance.quick-read: on > > > > performance.io-thread-count: 64 > > > > auth.allow: 10.2.*.*,10.1.*.* > > > > > > > > > > > > 'gluster volume status gl detail': > > > > <http://pastie.org/8548826> > > > > > > > > --- > > > > Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine > > > > [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 > > > > 415 South Circle View Dr, Irvine, CA, 92697 [shipping] > > > > MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) > > > > --- > > > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users@xxxxxxxxxxx > > http://supercolony.gluster.org/mailman/listinfo/gluster-users
--- Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 415 South Circle View Dr, Irvine, CA, 92697 [shipping] MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) ---
|
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users