Re: ganesha.conf template for nfs-rgw (was Re: getting inconsistent results in nfs-rgw readdir :( )

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Setting dir_chunk=0 bypasses all Ganesha readdir code. This is what completely breaks RGW, since RGW depends on that code. A bit of background.

NFS uses a POSIX-like cookie system for readdir. This means that each dirent has a cookie (64-bit integer) associated with it, that, when passed to the server in another readdir, get then dirent *after* the dirent associated with the cookie. This can cause issues, even on local filesystems, when mutation of the directory changes the ordering of dirents.

RGW cannot support this directly, as it has no concept of an inode. Instead, it does listings based on object names, of "arbitrary" length, and the listing starts with the first object matching the name. This means that something has to do a mapping between names and cookies. Since Ganesha is the NFS translator, it does this mapping, in it's readdir code. This means that disabling the readdir code (with dir_chunk=0) will break RGW directory listings completely.

It's my opinion that dir_chunk should generally be used by all FSALs, if at all possible. We've put a lot of work into the readdir code, and in all of our testing, it has made readdir traversals much faster than having it disabled. I admit, I have not run numbers on CephFS, so this might be the exception.

This opinion aside, RGW depends on having dir_chunk enabled, and this is a global config knob on Ganesha. So, if CephFS and RGW are going to share a Ganesha instance, it has to have dir_chunk enabled. If we want to have dir_chunk disabled for CephFS, then we will need to stand up separate Ganesha instances for CephFS and RGW. This should be doable on a containerized setup, but will be more difficult on real hardware, as one of them will need to run on a non-standard port.

(And the docs aren't great, sorry about that)

Daniel

On 5/26/21 8:55 AM, Jeff Layton wrote:
Good question:

When I originally started doing the clustered ganesha over cephfs work,
I immediately moved to disable as much caching as possible in ganesha,
figuring that:

a) double caching is wasteful with memory

...and...

b) that libcephfs knows better when it's safe to cache and when not

So, other more experienced ganesha folks recommended some settings at
that time, including dir_chunk=0.

I've never done any significant testing with dir_chunk set to anything
but 0, and I don't have a very clear idea of what setting dir_chunk
actually _does_.

Ganesha's docs are no help here either. They just say:

Dir_Chunk(uint32, range 0 to UINT32_MAX, default 128)
     Size of per-directory dirent cache chunks, 0 means directory
chunking is not enabled.

...but I'm not sure what directory chunking even _is_ and when and why
I'd want to enable or disable it.

If we set it to a non-zero value (or don't set it at all), what sort of
effects can we expect?

-- Jeff

On Tue, 2021-05-25 at 10:35 -0500, Sage Weil wrote:
Adding dev list.

Jeff, is it okay to remove dir_chunk=0 for the cephfs case?

sage


On Tue, May 25, 2021 at 7:43 AM Daniel Gryniewicz <dang@xxxxxxxxxx> wrote:

I think dir_chunk=0 should never be used, even for cephfs.  It's not
intended to be used in general, only for special circumstances (an
out-of-tree FSAL asked for it, and we use it upstream for debugging
readdir), and it may go away in a future version of Ganesha.

The rest is probably okay for both of them.  However, this raises some
issues.  Some settings, such as dir_chunk=0, Attr_Expiration_Time=0, and
only_numeric_onwers=true are global to Ganesha.  This means that, if
CephFS and RGW need different global settings, they'd have to run in
different instances of Ganesha.  Is this something we're interested in?

Daniel

On 5/25/21 8:11 AM, Sebastian Wagner wrote:
Moving this to upstream, as this is an upstream issue.

Hi Mike, hi Sage,

Do we need to rethink how we deploy ganesha daemons? Looks like we need
different ganesha.conf templates for cephfs and rgw.

- Sebastian

Am 25.05.21 um 13:59 schrieb Matt Benjamin:
Hi Sebastian,

1. yes, I think we should use different templates
2. MDCACHE { dir_chunk = 0; } is fatal for RGW NFS--it seems suited to
avoid double caching of vnodes in the cephfs driver, but simply cannot
be used with RGW
3. RGW has some other preferences--for example, some environments
might prefer only_numeric_owners = true;  Sage is already working on
extending cephadm to generate exports differently, which should allow
for multiple tenants

Matt

On Tue, May 25, 2021 at 7:39 AM Sebastian Wagner <sewagner@xxxxxxxxxx>
wrote:
Hi Matt,

This is the ganesha.conf template that we use for both cephfs and rgw:

https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/templates/services/nfs/ganesha.conf.j2


I have the slight impression that we might need to different templates
for rgw and cephfs?

Best,
Sebastian

...snip...




_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux