Thanks, Jeff! I ran readdir.c on all 23 bricks on the gluster nfs server to which my test clients are connected (one client that's working, and one that's not; and I ran on those, too). The results are attached. The values it prints are all well within 32 bits, *except* for one that's suspiciously the max 32-bit signed int: $ cat readdir.out.* | awk '{print $1}' | sort | uniq | tail 0x000000000000fd59 0x000000000000fd6b 0x000000000000fd7d 0x000000000000fd8f 0x000000000000fda1 0x000000000000fdb3 0x000000000000fdc5 0x000000000000fdd7 0x000000000000fde8 0x000000007fffffff That outlier is the same subdirectory on all 23 bricks. Could this be the issue? Thanks, John On Fri, Jun 14, 2013 at 11:05 AM, John Brunelle <john_brunelle at harvard.edu> wrote: > Thanks for the reply, Vijay. I set that parameter "On", but it hasn't > helped, and in fact it seems a bit worse. After making the change on > the volume and dropping caches on some test clients, some are now > seeing zero subdirectories at all. In my tests before, after dropping > caches clients go back to seeing all the subdirectories, and it's only > after a while they start disappearing (and have never gone to zero > before). > > Any other ideas? > > Thanks, > > John > > On Fri, Jun 14, 2013 at 10:35 AM, Vijay Bellur <vbellur at redhat.com> wrote: >> On 06/13/2013 03:38 PM, John Brunelle wrote: >>> >>> Hello, >>> >>> We're having an issue with our distributed gluster filesystem: >>> >>> * gluster 3.3.1 servers and clients >>> * distributed volume -- 69 bricks (4.6T each) split evenly across 3 nodes >>> * xfs backend >>> * nfs clients >>> * nfs.enable-ino32: On >>> >>> * servers: CentOS 6.3, 2.6.32-279.14.1.el6.centos.plus.x86_64 >>> * cleints: CentOS 5.7, 2.6.18-274.12.1.el5 >>> >>> We have a directory containing 3,343 subdirectories. On some clients, >>> ls lists only a subset of the directories (a different amount on >>> different clients). On others, ls gets stuck in a getdents loop and >>> consumes more and more memory until it hits ENOMEM. On yet others, it >>> works fine. Having the bad clients remount or drop caches makes the >>> problem temporarily go away, but eventually it comes back. The issue >>> sounds a lot like bug #838784, but we are using xfs on the backend, >>> and this seems like more of a client issue. >> >> >> Turning on "cluster.readdir-optimize" can help readdir when a directory >> contains a number of sub-directories and there are more bricks in the >> volume. Do you observe any change with this option enabled? >> >> -Vijay >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: readdir_output.tar.bz2 Type: application/x-bzip2 Size: 327378 bytes Desc: not available URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130614/f91c9ec0/attachment-0001.bz2>