Very slow directory listing and high CPU usage on replicated volume

jdarcy at redhat.com (Jeff Darcy) · Tue, 06 Nov 2012 12:26:35 -0500

On 11/06/2012 11:27 AM, Fernando Frediani (Qube) wrote:
> Why does it need to rely on FUSE. Why can't it be something that run in
> kernel that doesn't have any reliance on FUSE ? I imagine that would require
> a lot of engineering but the benefits no need to mention.

"A lot" doesn't even capture the magnitude of the effort.  For all of its
warts, which seem to be most of what we hear about, GlusterFS embodies some
pretty advanced technology - not just in the I/O path, but things like online
maintenance and even reconfiguration as well.  None of that would have been
possible with the slower development pace of working within the kernel (said as
a kernel developer since before there was Linux BTW).  We'd have to completely
stop all other feature development or bug fixing, retrain the half the staff,
and work for a full year or two to get our own code plus all of the libraries
we rely on into the kernel, and then we wouldn't be portable to other platforms
as we are now.  There's a reason Ceph, which was brilliantly conceived and is
being worked on by an excellent team, has taken this long to become semi-stable
and still lags in most areas other than performance.  The fact that it's in
user space is an integral part of how GlusterFS has evolved and will continue
to evolve.

> Does anyone know a
> bit of architecture of Isilon and of other POSIX compliant distributed
> filesystems  ?

Quite a bit, actually, though of course more about the open-source ones than
about the proprietary ones like Isilon.  We all face many of the same problems,
and make some of the same tradeoffs.  Some have chosen single-metadata-server
models that work great for small systems that never fail but become a nightmare
for truly large systems or those that have to stay up despite hardware
failures.  Some have chosen to do more caching, either with or without
invalidation from the server.  GlusterFS has historically chosen a pretty
strong consistency model, but has also chosen a client-centric model that
precludes an invalidation-based implementation.  One could certainly argue
about whether those are the right choices and they might change some day - I
personally would like to see both weaker consistency and more use of
lease-based caching - but for now and for the immediate future those choices
determine whether a given workload will perform well or poorly.  There's only
so much we can do with optimization.