Re: gluster fails under heavy array job load load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 

Bug 1043009 Submitted

 

 

On Thursday, December 12, 2013 11:46:03 PM Anand Avati wrote:

> Please provide the full client and server logs (in a bug report). The

> snippets give some hints, but are not very meaningful without the full

> context/history since mount time (they have after-the-fact symptoms, but

> not the part which show the reason why disconnects happened).

>

> Even before looking into the full logs here are some quick observations:

>

> - write-behind-window-size = 1024MB seems *excessively* high. Please set

> this to 1MB (default) and check if the stability improves.

>

> - I see RDMA is enabled on the volume. Are you mounting clients through

> RDMA? If so, for the purpose of diagnostics can you mount through TCP and

> check the stability improves? If you are using RDMA with such a high

> write-behind-window-size, spurious ping-timeouts are an almost certainty

> during heavy writes. The RDMA driver has limited flow control, and setting

> such a high window-size can easily congest all the RDMA buffers resulting

> in spurious ping-timeouts and disconnections.

>

> Avati

>

> On Thu, Dec 12, 2013 at 5:03 PM, harry mangalam <harry.mangalam@xxxxxxx>wrote:

> > Hi All,

> >

> > (Gluster Volume Details at bottom)

> >

> >

> >

> > I've posted some of this previously, but even after various upgrades,

> > attempted fixes, etc, it remains a problem.

> >

> >

> >

> >

> >

> > Short version: Our gluster fs (~340TB) provides scratch space for a

> > ~5000core academic compute cluster.

> >

> > Much of our load is streaming IO, doing a lot of genomics work, and that

> > is the load under which we saw this latest failure.

> >

> > Under heavy batch load, especially array jobs, where there might be

> > several 64core nodes doing I/O on the 4servers/8bricks, we often get job

> > failures that have the following profile:

> >

> >

> >

> > Client POV:

> >

> > Here is a sampling of the client logs (/var/log/glusterfs/gl.log) for all

> > compute nodes that indicated interaction with the user's files

> >

> > <http://pastie.org/8548781>

> >

> >

> >

> > Here are some client Info logs that seem fairly serious:

> >

> > <http://pastie.org/8548785>

> >

> >

> >

> > The errors that referenced this user were gathered from all the nodes that

> > were running his code (in compute*) and agglomerated with:

> >

> >

> >

> > cut -f2,3 -d']' compute* |cut -f1 -dP | sort | uniq -c | sort -gr

> >

> >

> >

> > and placed here to show the profile of errors that his run generated.

> >

> > <http://pastie.org/8548796>

> >

> >

> >

> > so 71 of them were:

> >

> > W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gl-client-7: remote

> > operation failed: Transport endpoint is not connected.

> >

> > etc

> >

> >

> >

> > We've seen this before and previously discounted it bc it seems to have

> > been related to the problem of spurious NFS-related bugs, but now I'm

> > wondering whether it's a real problem.

> >

> > Also the 'remote operation failed: Stale file handle. ' warnings.

> >

> >

> >

> > There were no Errors logged per se, tho some of the W's looked fairly

> > nasty, like the 'dht_layout_dir_mismatch'

> >

> >

> >

> > From the server side, however, during the same period, there were:

> >

> > 0 Warnings about this user's files

> >

> > 0 Errors

> >

> > 458 Info lines

> >

> > of which only 1 line was not a 'cleanup' line like this:

> >

> > ---

> >

> > 10.2.7.11:[2013-12-12 21:22:01.064289] I

> > [server-helpers.c:460:do_fd_cleanup] 0-gl-server: fd cleanup on

> > /path/to/file

> >

> > ---

> >

> > it was:

> >

> > ---

> >

> > 10.2.7.14:[2013-12-12 21:00:35.209015] I

> > [server-rpc-fops.c:898:_gf_server_log_setxattr_failure] 0-gl-server:

> > 113697332: SETXATTR /bio/tdlong/RNAseqIII/ckpt.1084030

> > (c9488341-c063-4175-8492-75e2e282f690) ==> trusted.glusterfs.dht

> >

> > ---

> >

> >

> >

> > We're losing about 10% of these kinds of array jobs bc of this, which is

> > just not supportable.

> >

> >

> >

> >

> >

> >

> >

> > Gluster details

> >

> >

> >

> > servers and clients running gluster 3.4.0-8.el6 over QDR IB, IPoIB, thru 2

> > Mellanox, 1 Voltaire switches, Mellanox cards, CentOS 6.4

> >

> >

> >

> > $ gluster volume info

> >

> > Volume Name: gl

> >

> > Type: Distribute

> >

> > Volume ID: 21f480f7-fc5a-4fd8-a084-3964634a9332

> >

> > Status: Started

> >

> > Number of Bricks: 8

> >

> > Transport-type: tcp,rdma

> >

> > Bricks:

> >

> > Brick1: bs2:/raid1

> >

> > Brick2: bs2:/raid2

> >

> > Brick3: bs3:/raid1

> >

> > Brick4: bs3:/raid2

> >

> > Brick5: bs4:/raid1

> >

> > Brick6: bs4:/raid2

> >

> > Brick7: bs1:/raid1

> >

> > Brick8: bs1:/raid2

> >

> > Options Reconfigured:

> >

> > performance.write-behind-window-size: 1024MB

> >

> > performance.flush-behind: on

> >

> > performance.cache-size: 268435456

> >

> > nfs.disable: on

> >

> > performance.io-cache: on

> >

> > performance.quick-read: on

> >

> > performance.io-thread-count: 64

> >

> > auth.allow: 10.2.*.*,10.1.*.*

> >

> >

> >

> >

> >

> > 'gluster volume status gl detail':

> >

> > <http://pastie.org/8548826>

> >

> >

> >

> > ---

> >

> > Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine

> >

> > [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487

> >

> > 415 South Circle View Dr, Irvine, CA, 92697 [shipping]

> >

> > MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)

> >

> > ---

> >

> >

> >

> > _______________________________________________

> > Gluster-users mailing list

> > Gluster-users@xxxxxxxxxxx

> > http://supercolony.gluster.org/mailman/listinfo/gluster-users

 

---

Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine

[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487

415 South Circle View Dr, Irvine, CA, 92697 [shipping]

MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)

---

 

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux