Sure thing, #1086303: https://bugzilla.redhat.com/show_bug.cgi?id=1086303 On Thu, Apr 10, 2014 at 7:04 AM, John Mark Walker <jowalker@xxxxxxxxxx> wrote: > Hi James, > > This definitely looks worthy of investigation. Could you file a bug? We need to get our guys on this. > > Thanks for doing your homework. Send us the BZ #, and we'll start poking around. > > -JM > > > ----- Original Message ----- >> Hey Joe! >> >> Yeah we are all XFS all the time round here - none of that nasty ext4 >> combo that we know causes raised levels of mercury :-) >> >> The brick errors, we have not seen any we have been busy grepping and >> alerting on anything suspect in our logs. Mind you there are hundreds >> of brick logs to search through I'm not going to say we may have >> missed one, but after asking the boys in chat just now they are pretty >> convinced that was not the smoking gun. I'm sure they will chip in on >> this thread if there is anything. >> >> >> j. >> >> -- >> dr. james cuff, assistant dean for research computing, harvard >> university | division of science | thirty eight oxford street, >> cambridge. ma. 02138 | +1 617 384 7647 | http://rc.fas.harvard.edu >> >> >> On Wed, Apr 9, 2014 at 10:36 AM, Joe Julian <joe@xxxxxxxxxxxxxxxx> wrote: >> > What's the backend filesystem? >> > Were there any brick errors, probably around 2014-03-31 22:44:04 (half an >> > hour before the frame timeout)? >> > >> > >> > On April 9, 2014 7:10:58 AM PDT, James Cuff <james_cuff@xxxxxxxxxxx> wrote: >> >> >> >> Hi team, >> >> >> >> I hate "me too" emails sometimes not at all constructive, but I feel I >> >> really ought chip in from real world systems we use in anger and at >> >> massive scale here. >> >> >> >> So we also use NFS to "mask" this and other performance issues. The >> >> cluster.readdir-optimize gave us similar results unfortunately. >> >> >> >> We reported our other challenge back last summer but we stalled on this: >> >> >> >> http://www.gluster.org/pipermail/gluster-users/2013-June/036252.html >> >> >> >> We also unfortunately now see a new NFS phenotype that I've pasted >> >> below which is again is causing real heartburn. >> >> >> >> Small files, always difficult for any FS, might be worth doing some >> >> regression testing with small file directory scenarios in test - it's >> >> an easy reproducer on even moderately sized gluster clusters. Hope >> >> some good progress can be >> >> made, and I understand it's a tough one to >> >> track down performance hangs and issues. I just wanted to say that we >> >> really do see them, and have tried many things to avoid them. >> >> >> >> Here's the note from my team: >> >> >> >> We were hitting 30 minute timeouts on getxattr/system.posix_acl_access >> >> calls on directories in a NFS v3 mount (w/ acl option) of a 10-node >> >> 40-brick gluster 3.4.0 volume. Strace shows where the client hangs: >> >> >> >> $ strace -tt -T getfacl d6h_take1 >> >> ... >> >> 18:43:57.929225 lstat("d6h_take1", {st_mode=S_IFDIR|0755, >> >> st_size=7024, ...}) = 0 <0.257107> >> >> 18:43:58.186461 getxattr("d6h_take1", "system.posix_acl_access", >> >> 0x7fffdf2b9f50, 132) = -1 ENODATA (No data available) <1806.296893> >> >> 19:14:04.483556 stat("d6h_take1", {st_mode=S_IFDIR|0755, st_size=7024, >> >> ...}) = 0 <0.642362> >> >> 19:14:05.126025 getxattr("d6h_take1", "system.posix_acl_default", >> >> 0x7fffdf2b9f50, 132) = -1 ENODATA (No data >> >> available) <0.000024> >> >> 19:14:05.126114 stat("d6h_take1", {st_mode=S_IFDIR|0755, st_size=7024, >> >> ...}) = 0 <0.000010> >> >> ... >> >> >> >> Load on the servers was moderate. While the above was hanging, >> >> getfacl worked nearly instantaneously on that directory on all bricks. >> >> When it finally hit the 30 minute timeout, gluster logged it in >> >> nfs.log: >> >> >> >> [2014-03-31 23:14:04.481154] E [rpc-clnt.c:207:call_bail] >> >> 0-holyscratch-client-36: bailing out frame type(GlusterFS 3.3) >> >> op(GETXATTR(18)) xid = 0x8168809x sent = 2014-03-31 22:43:58.442411. >> >> timeout = 1800 >> >> [2014-03-31 23:14:04.481233] W >> >> [client-rpc-fops.c:1112:client3_3_getxattr_cbk] >> >> 0-holyscratch-client-36: remote operation failed: Transport endpoint >> >> is not connected. Path: <gfid:b116fb01-b13d-448a-90d0-a8693a98698b> >> >> (b116fb01-b13d-448a-90d0-a8693a98698b). Key: (null) >> >> >> >> Other than that, we didn't see anything directly related in the nfs or >> >> brick logs or anything out of sorts with the gluster services. A >> >> couple other errors raise eyebrows, but these are different >> >> directories (neighbors of the example above) and at different times: >> >> >> >> holyscratch07: /var/log/glusterfs/nfs.log:[2014-03-31 19:30:47.794454] >> >> I [dht-layout.c:630:dht_layout_normalize] 0-holyscratch-dht: found >> >> anomalies in /ramanathan_lab/dhuh/d9_take2_BGI/Diffreg. holes=1 >> >> overlaps=0 >> >> holyscratch07: /var/log/glusterfs/nfs.log:[2014-03-31 19:31:47.794447] >> >> I [dht-layout.c:630:dht_layout_normalize] 0-holyscratch-dht: found >> >> anomalies in /ramanathan_lab/dhuh/d9_take2_BGI/Diffreg. holes=1 >> >> overlaps=0 >> >> holyscratch07: /var/log/glusterfs/nfs.log:[2014-03-31 19:33:47.802135] >> >> I [dht-layout.c:630:dht_layout_normalize] 0-holyscratch-dht: found >> >> anomalies in /ramanathan_lab/dhuh/d9_take2_BGI/Diffreg. holes=1 >> >> overlaps=0 >> >> holyscratch07: /var/log/glusterfs/nfs.log:[2014-03-31 19:34:47.802182] >> >> I >> >> [dht-layout.c:630:dht_layout_normalize] 0-holyscratch-dht: found >> >> anomalies in /ramanathan_lab/dhuh/d9_take2_BGI/Diffreg. holes=1 >> >> overlaps=0 >> >> holyscratch07: /var/log/glusterfs/nfs.log:[2014-03-31 19:36:47.764329] >> >> I [dht-layout.c:630:dht_layout_normalize] 0-holyscratch-dht: found >> >> anomalies in /ramanathan_lab/dhuh/d9_take2_BGI/Diffreg. holes=1 >> >> overlaps=0 >> >> holyscratch07: /var/log/glusterfs/nfs.log:[2014-03-31 19:37:47.773164] >> >> I [dht-layout.c:630:dht_layout_normalize] 0-holyscratch-dht: found >> >> anomalies in /ramanathan_lab/dhuh/d9_take2_BGI/Diffreg. holes=1 >> >> overlaps=0 >> >> holyscratch07: /var/log/glusterfs/nfs.log:[2014-03-31 19:39:47.774285] >> >> I [dht-layout.c:630:dht_layout_normalize] 0-holyscratch-dht: found >> >> anomalies in /ramanathan_lab/dhuh/d9_take2_BGI/Diffreg. holes=1 >> >> overlaps=0 >> >> holyscratch07: /var/log/glusterfs/nfs.log:[2014-03-31 19:40:47.780338] >> >> I [dht-layout.c:630:dht_layout_normalize] 0-holyscratch-dht: >> >> found >> >> anomalies in /ramanathan_lab/dhuh/d9_take2_BGI/Diffreg. holes=1 >> >> overlaps=0 >> >> holyscratch07: /var/log/glusterfs/nfs.log:[2014-03-31 19:42:47.730345] >> >> I [dht-layout.c:630:dht_layout_normalize] 0-holyscratch-dht: found >> >> anomalies in /ramanathan_lab/dhuh/d9_take2_BGI/Diffreg. holes=1 >> >> overlaps=0 >> >> >> >> holyscratch08: >> >> /var/log/glusterfs/bricks/holyscratch08_03-brick.log:[2014-03-31 >> >> 00:57:51.973565] E [posix-helpers.c:696:posix_handle_pair] >> >> 0-holyscratch-posix: >> >> /holyscratch08_03/brick/ramanathan_lab/dhuh/d9_take2_BGI/cuffdiffRN.txt: >> >> key:system.posix_acl_access error:Invalid argument >> >> holyscratch08: >> >> /var/log/glusterfs/bricks/holyscratch08_03-brick.log:[2014-03-31 >> >> 01:18:12.345818] E [posix-helpers.c:696:posix_handle_pair] >> >> 0-holyscratch-posix: >> >> /holyscratch08_03/brick/ramanathan_lab/dhuh/d9_take2_BGI/cuffdiffRN.txt: >> >> key:system.posix_acl_access error:Invalid argument >> >> holyscratch05: >> >> /var/log/glusterfs/bricks/holyscratch05_04-brick.log:[2014-03-31 >> >> 21:16:37.057674] E [posix-helpers.c:696:posix_handle_pair] >> >> 0-holyscratch-posix: >> >> >> >> /holyscratch05_04/brick/ramanathan_lab/dhuh/d9_take2_BGI/Diffreg/cuffdiffRN.txt: >> >> key:system.posix_acl_access error:Invalid argument >> >> >> >> -- >> >> dr. james cuff, assistant dean for research computing, harvard >> >> university | division of science | thirty eight oxford street, >> >> cambridge. ma. 02138 | +1 617 384 7647 | http://rc.fas.harvard.edu >> >> >> >> >> >> On Wed, Apr 9, 2014 at 9:52 AM, <james.bellinger@xxxxxxxxxxxxxxxx> wrote: >> >>> >> >>> I am seeing something perhaps similar. 3.4.2-1, 2 servers, each with 1 >> >>> brick, replicated. A du of a local (ZFS) directory tree of 297834 files >> >>> and 525GB takes about 17 minutes. A du of the gluster copy >> >>> is still not >> >>> finished after 22 hours. Network activity has been about 5-6KB/sec >> >>> until >> >>> (I gather) du hit a directory with 22450 files, when activity jumped to >> >>> 300KB/sec (200 packets/sec) for about 15-20 minutes. If I assume that >> >>> the >> >>> spike came from scanning the two largest directories, that looks like >> >>> about 8K of traffic per file, and about 5 packets. >> >>> >> >>> A 3.3.2 gluster installation that we are trying to retire is not >> >>> afflicted >> >>> this way. >> >>> >> >>> James Bellinger >> >>> >> >>>> >> >>>> Am I the only person using Gluster suffering from very slow directory >> >>>> access? It's so seriously bad that it almost makes Gluster unusable. >> >>>> >> >>>> Using NFS instead of the Fuse client masks the problem as long as the >> >>>> directories are cached but it's still hellishly slow when you first >> >>>> access them. >> >>>> >> >>>> Has there >> >>>> been any progress at all fixing this bug? >> >>>> >> >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1067256 >> >>>> >> >>>> Cheers, >> >>>> >> >>>> ________________________________ >> >>>> >> >>>> Gluster-users mailing list >> >>>> Gluster-users@xxxxxxxxxxx >> >>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> >>> >> >>> >> >>> >> >>> >> >>> ________________________________ >> >>> >> >>> Gluster-users mailing list >> >>> Gluster-users@xxxxxxxxxxx >> >>> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> >> >> >> ________________________________ >> >> >> >> Gluster-users mailing list >> >> Gluster-users@xxxxxxxxxxx >> >> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> > >> > >> > -- >> > Sent from my Android device with K-9 Mail. Please excuse my brevity. >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@xxxxxxxxxxx >> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://supercolony.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users