Re: [nfs-ganesha]{PATCH] FSAL_CEPH: do an inode lookup vs. MDS when the Inode is not in cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2017-11-10 at 21:39 +0800, Yan, Zheng wrote:
> On Fri, Nov 10, 2017 at 7:24 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > On Fri, 2017-11-10 at 10:31 +0800, Yan, Zheng wrote:
> > > On Fri, Nov 10, 2017 at 12:03 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > > > From: Jeff Layton <jlayton@xxxxxxxxxx>
> > > > 
> > > > FSAL_CEPH just does a ceph_ll_get_inode when asked to create a
> > > > filehandle today, which only does a query of the local Ceph client
> > > > cache. This means, it can only find Inodes that were previously found
> > > > via name lookup.
> > > > 
> > > > When ganesha is restarted, we have a blank-slate Client, and the Inode
> > > > will not be in cache. That ends up with the NFS client getting back an
> > > > ESTALE.
> > > > 
> > > > We must be able to find Inodes that may not be in the cache. When
> > > > ceph_ll_get_inode can't find an Inode, try a lookup vs. the MDS for it.
> > > > 
> > > > The only problem is that we currently have no way to look up a snapped
> > > > inode from scratch. I think we'll probably need to add that ability to
> > > > libcephfs, but for now the best we can do is just bail out with an
> > > > ESTALE in that case.
> > > > 
> > > > While we're in there, remove the comment about ceph_ll_connectable_m.
> > > > There is no such function in libcephfs and never has been. The ds
> > > > code does seem to refer to such a call, but it's not clear to me
> > > > what it's supposed to do.
> > > > 
> > > > Change-Id: I2d0e362ef8f28ba78575b60f3fb2890096d98fc6
> > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > > > ---
> > > > 
> > > > I sent this up through git-review to ganesha's gerrit, but I'd
> > > > like some cephfs folks to have a look too. This patch is basically
> > > > a workaround for the FSAL_CEPH code for now.
> > > > 
> > > > Do we need to worry about inodes that are not CEPH_NOSNAP? It
> > > > seems like we might need for libcephfs to grow an interface that
> > > > can look up snapped inodes as well?
> > > > 
> > > > Either that, or teach ceph_ll_get_inode how to look up inodes
> > > > that are not in its cache. Would that be a better fix here?
> > > > 
> > > >  src/FSAL/FSAL_CEPH/export.c | 19 +++++++++++++++----
> > > >  1 file changed, 15 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/src/FSAL/FSAL_CEPH/export.c b/src/FSAL/FSAL_CEPH/export.c
> > > > index 3de25fdfaf41..a31e3bb3ebef 100644
> > > > --- a/src/FSAL/FSAL_CEPH/export.c
> > > > +++ b/src/FSAL/FSAL_CEPH/export.c
> > > > @@ -234,12 +234,23 @@ static fsal_status_t create_handle(struct fsal_export *export_pub,
> > > >                 return status;
> > > >         }
> > > > 
> > > > +       /* Check our local cache first */
> > > >         i = ceph_ll_get_inode(export->cmount, *vi);
> > > > -       if (!i)
> > > > -               return ceph2fsal_error(-ESTALE);
> > > > +       if (!i) {
> > > > +               /*
> > > > +                * Try the slow way, may not be in cache now.
> > > > +                *
> > > > +                * Currently, there is no interface for looking up a snapped
> > > > +                * inode, so we just bail here in that case.
> > > > +                */
> > > > +               if (vi->snapid.val != CEPH_NOSNAP)
> > > > +                       return ceph2fsal_error(-ESTALE);
> > > > +
> > > > +               rc = ceph_ll_lookup_inode(export->cmount, vi->ino, &i);
> > > > +               if (rc)
> > > > +                       return ceph2fsal_error(rc);
> > > > +       }
> > > > 
> > > > -       /* The ceph_ll_connectable_m should have populated libceph's
> > > > -          cache with all this anyway */
> > > >         rc = fsal_ceph_ll_getattr(export->cmount, i, &stx,
> > > >                 attrs_out ? CEPH_STATX_ATTR_MASK : CEPH_STATX_HANDLE_MASK,
> > > >                 op_ctx->creds);
> > > > --
> > > > 2.13.6
> > > > 
> > > 
> > > Looks good to me,
> > > 
> > > Tested-by: "Yan, Zheng" <zyan@xxxxxxxxxx>
> > > 
> > 
> > Thanks. I still wonder whether this is a sufficient fix.
> > 
> > ceph_ll_lookup_inode does not take the snapid into account. Do we need a
> > version of ceph_ll_lookup_inode that can find snapshotted inodes after
> > ganesha has restarted?
> > 
> 
> lookup snapped inode by number hasn't been implemented in mds.  not
> sure if we can extend backtrace to support snapped inode.
> 
> Regards
> Yan, Zheng
> 

Huh, ok... I looked over the cephfs code a bit yesterday and thought
that CEPH_MDS_OP_LOOKUPSNAP might do the right thing? I don't really
understand the cephfs snapshot model at all though, so I'll take your
word for it. :)

In that case, I guess this is probably the best we can do for now, along
with a recommendation to avoid snapshots when working with ganesha.

If we were worried about it, we could probably prevent ganesha from
using non-CEPH_NOSNAP inodes at all. Have attempts to one for anything
just return a well-known error (NFS4ERR_NXIO or something).

> > It does seem like we could allow a nfs client do work in a snapshotted
> > directory, and then have ganesha restart. That client may be very
> > confused when the server comes back and all of the existing filehandles
> > go stale.
> > --
> > Jeff Layton <jlayton@xxxxxxxxxx>

-- 
Jeff Layton <jlayton@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux