Re: ganesha.conf template for nfs-rgw (was Re: getting inconsistent results in nfs-rgw readdir :( )

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



CephFS is not really too different from other filesystems in this
regard, so if it's safe on other fs's then it sounds like it's probably
safe to remove dir_chunk=0.

That said, I've not done any real testing without that option in place,
so we'd need to test it to be sure that it doesn't cause any issues wrt
correctness.

Cheers,
Jeff

On Wed, 2021-05-26 at 10:02 -0400, Daniel Gryniewicz wrote:
> Setting dir_chunk=0 bypasses all Ganesha readdir code.  This is what 
> completely breaks RGW, since RGW depends on that code.  A bit of background.
> 
> NFS uses a POSIX-like cookie system for readdir.  This means that each 
> dirent has a cookie (64-bit integer) associated with it, that, when 
> passed to the server in another readdir, get then dirent *after* the 
> dirent associated with the cookie.  This can cause issues, even on local 
> filesystems, when mutation of the directory changes the ordering of dirents.
> 
> RGW cannot support this directly, as it has no concept of an inode. 
> Instead, it does listings based on object names, of "arbitrary" length, 
> and the listing starts with the first object matching the name.  This 
> means that something has to do a mapping between names and cookies. 
> Since Ganesha is the NFS translator, it does this mapping, in it's 
> readdir code.  This means that disabling the readdir code (with 
> dir_chunk=0) will break RGW directory listings completely.
> 
> It's my opinion that dir_chunk should generally be used by all FSALs, if 
> at all possible.  We've put a lot of work into the readdir code, and in 
> all of our testing, it has made readdir traversals much faster than 
> having it disabled.  I admit, I have not run numbers on CephFS, so this 
> might be the exception.
> 
> This opinion aside, RGW depends on having dir_chunk enabled, and this is 
> a global config knob on Ganesha.  So, if CephFS and RGW are going to 
> share a Ganesha instance, it has to have dir_chunk enabled.  If we want 
> to have dir_chunk disabled for CephFS, then we will need to stand up 
> separate Ganesha instances for CephFS and RGW.  This should be doable on 
> a containerized setup, but will be more difficult on real hardware, as 
> one of them will need to run on a non-standard port.
> 
> (And the docs aren't great, sorry about that)
> 
> Daniel
> 
> On 5/26/21 8:55 AM, Jeff Layton wrote:
> > Good question:
> > 
> > When I originally started doing the clustered ganesha over cephfs work,
> > I immediately moved to disable as much caching as possible in ganesha,
> > figuring that:
> > 
> > a) double caching is wasteful with memory
> > 
> > ...and...
> > 
> > b) that libcephfs knows better when it's safe to cache and when not
> > 
> > So, other more experienced ganesha folks recommended some settings at
> > that time, including dir_chunk=0.
> > 
> > I've never done any significant testing with dir_chunk set to anything
> > but 0, and I don't have a very clear idea of what setting dir_chunk
> > actually _does_.
> > 
> > Ganesha's docs are no help here either. They just say:
> > 
> > Dir_Chunk(uint32, range 0 to UINT32_MAX, default 128)
> >      Size of per-directory dirent cache chunks, 0 means directory
> > chunking is not enabled.
> > 
> > ...but I'm not sure what directory chunking even _is_ and when and why
> > I'd want to enable or disable it.
> > 
> > If we set it to a non-zero value (or don't set it at all), what sort of
> > effects can we expect?
> > 
> > -- Jeff
> > 
> > On Tue, 2021-05-25 at 10:35 -0500, Sage Weil wrote:
> > > Adding dev list.
> > > 
> > > Jeff, is it okay to remove dir_chunk=0 for the cephfs case?
> > > 
> > > sage
> > > 
> > > 
> > > On Tue, May 25, 2021 at 7:43 AM Daniel Gryniewicz <dang@xxxxxxxxxx> wrote:
> > > > 
> > > > I think dir_chunk=0 should never be used, even for cephfs.  It's not
> > > > intended to be used in general, only for special circumstances (an
> > > > out-of-tree FSAL asked for it, and we use it upstream for debugging
> > > > readdir), and it may go away in a future version of Ganesha.
> > > > 
> > > > The rest is probably okay for both of them.  However, this raises some
> > > > issues.  Some settings, such as dir_chunk=0, Attr_Expiration_Time=0, and
> > > > only_numeric_onwers=true are global to Ganesha.  This means that, if
> > > > CephFS and RGW need different global settings, they'd have to run in
> > > > different instances of Ganesha.  Is this something we're interested in?
> > > > 
> > > > Daniel
> > > > 
> > > > On 5/25/21 8:11 AM, Sebastian Wagner wrote:
> > > > > Moving this to upstream, as this is an upstream issue.
> > > > > 
> > > > > Hi Mike, hi Sage,
> > > > > 
> > > > > Do we need to rethink how we deploy ganesha daemons? Looks like we need
> > > > > different ganesha.conf templates for cephfs and rgw.
> > > > > 
> > > > > - Sebastian
> > > > > 
> > > > > Am 25.05.21 um 13:59 schrieb Matt Benjamin:
> > > > > > Hi Sebastian,
> > > > > > 
> > > > > > 1. yes, I think we should use different templates
> > > > > > 2. MDCACHE { dir_chunk = 0; } is fatal for RGW NFS--it seems suited to
> > > > > > avoid double caching of vnodes in the cephfs driver, but simply cannot
> > > > > > be used with RGW
> > > > > > 3. RGW has some other preferences--for example, some environments
> > > > > > might prefer only_numeric_owners = true;  Sage is already working on
> > > > > > extending cephadm to generate exports differently, which should allow
> > > > > > for multiple tenants
> > > > > > 
> > > > > > Matt
> > > > > > 
> > > > > > On Tue, May 25, 2021 at 7:39 AM Sebastian Wagner <sewagner@xxxxxxxxxx>
> > > > > > wrote:
> > > > > > > Hi Matt,
> > > > > > > 
> > > > > > > This is the ganesha.conf template that we use for both cephfs and rgw:
> > > > > > > 
> > > > > > > https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/templates/services/nfs/ganesha.conf.j2
> > > > > > > 
> > > > > > > 
> > > > > > > I have the slight impression that we might need to different templates
> > > > > > > for rgw and cephfs?
> > > > > > > 
> > > > > > > Best,
> > > > > > > Sebastian
> > > > > 
> > > > > ...snip...
> > > > > 
> > > > 
> > > 
> > 
> 

-- 
Jeff Layton <jlayton@xxxxxxxxxx>
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux