incomplete listing of a directory, sometimes getdents loops until out of memory

john_brunelle at harvard.edu (John Brunelle) · Fri, 14 Jun 2013 11:05:24 -0400

Thanks for the reply, Vijay.  I set that parameter "On", but it hasn't
helped, and in fact it seems a bit worse.  After making the change on
the volume and dropping caches on some test clients, some are now
seeing zero subdirectories at all.  In my tests before, after dropping
caches clients go back to seeing all the subdirectories, and it's only
after a while they start disappearing (and have never gone to zero
before).

Any other ideas?

Thanks,

John

On Fri, Jun 14, 2013 at 10:35 AM, Vijay Bellur <vbellur at redhat.com> wrote:
> On 06/13/2013 03:38 PM, John Brunelle wrote:
>>
>> Hello,
>>
>> We're having an issue with our distributed gluster filesystem:
>>
>> * gluster 3.3.1 servers and clients
>> * distributed volume -- 69 bricks (4.6T each) split evenly across 3 nodes
>> * xfs backend
>> * nfs clients
>> * nfs.enable-ino32: On
>>
>> * servers: CentOS 6.3, 2.6.32-279.14.1.el6.centos.plus.x86_64
>> * cleints: CentOS 5.7, 2.6.18-274.12.1.el5
>>
>> We have a directory containing 3,343 subdirectories.  On some clients,
>> ls lists only a subset of the directories (a different amount on
>> different clients).  On others, ls gets stuck in a getdents loop and
>> consumes more and more memory until it hits ENOMEM.  On yet others, it
>> works fine.  Having the bad clients remount or drop caches makes the
>> problem temporarily go away, but eventually it comes back.  The issue
>> sounds a lot like bug #838784, but we are using xfs on the backend,
>> and this seems like more of a client issue.
>
>
> Turning on "cluster.readdir-optimize" can help readdir when a directory
> contains a number of sub-directories and there are more bricks in the
> volume. Do you observe any change with this option enabled?
>
> -Vijay
>
>