Re: [PATCH v6 10/13] ceph: decode interval_sets for delegated inos

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2020-03-05 at 14:36 +0100, Ilya Dryomov wrote:
> On Thu, Mar 5, 2020 at 1:03 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > On Thu, 2020-03-05 at 11:45 +0000, Luis Henriques wrote:
> > > On Mon, Mar 02, 2020 at 09:14:31AM -0500, Jeff Layton wrote:
> > > > Starting in Octopus, the MDS will hand out caps that allow the client
> > > > to do asynchronous file creates under certain conditions. As part of
> > > > that, the MDS will delegate ranges of inode numbers to the client.
> > > > 
> > > > Add the infrastructure to decode these ranges, and stuff them into an
> > > > xarray for later consumption by the async creation code.
> > > > 
> > > > Because the xarray code currently only handles unsigned long indexes,
> > > > and those are 32-bits on 32-bit arches, we only enable the decoding when
> > > > running on a 64-bit arch.
> > > > 
> > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > > > ---
> > > >  fs/ceph/mds_client.c | 122 +++++++++++++++++++++++++++++++++++++++----
> > > >  fs/ceph/mds_client.h |   9 +++-
> > > >  2 files changed, 121 insertions(+), 10 deletions(-)
> > > > 
> > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > > index db8304447f35..87f75d05b004 100644
> > > > --- a/fs/ceph/mds_client.c
> > > > +++ b/fs/ceph/mds_client.c
> > > > @@ -415,21 +415,121 @@ static int parse_reply_info_filelock(void **p, void *end,
> > > >     return -EIO;
> > > >  }
> > > > 
> > > > +
> > > > +#if BITS_PER_LONG == 64
> > > > +
> > > > +#define DELEGATED_INO_AVAILABLE            xa_mk_value(1)
> > > > +
> > > > +static int ceph_parse_deleg_inos(void **p, void *end,
> > > > +                            struct ceph_mds_session *s)
> > > > +{
> > > > +   u32 sets;
> > > > +
> > > > +   ceph_decode_32_safe(p, end, sets, bad);
> > > > +   dout("got %u sets of delegated inodes\n", sets);
> > > > +   while (sets--) {
> > > > +           u64 start, len, ino;
> > > > +
> > > > +           ceph_decode_64_safe(p, end, start, bad);
> > > > +           ceph_decode_64_safe(p, end, len, bad);
> > > > +           while (len--) {
> > > > +                   int err = xa_insert(&s->s_delegated_inos, ino = start++,
> > > > +                                       DELEGATED_INO_AVAILABLE,
> > > > +                                       GFP_KERNEL);
> > > > +                   if (!err) {
> > > > +                           dout("added delegated inode 0x%llx\n",
> > > > +                                start - 1);
> > > > +                   } else if (err == -EBUSY) {
> > > > +                           pr_warn("ceph: MDS delegated inode 0x%llx more than once.\n",
> > > > +                                   start - 1);
> > > > +                   } else {
> > > > +                           return err;
> > > > +                   }
> > > > +           }
> > > > +   }
> > > > +   return 0;
> > > > +bad:
> > > > +   return -EIO;
> > > > +}
> > > > +
> > > > +u64 ceph_get_deleg_ino(struct ceph_mds_session *s)
> > > > +{
> > > > +   unsigned long ino;
> > > > +   void *val;
> > > > +
> > > > +   xa_for_each(&s->s_delegated_inos, ino, val) {
> > > > +           val = xa_erase(&s->s_delegated_inos, ino);
> > > > +           if (val == DELEGATED_INO_AVAILABLE)
> > > > +                   return ino;
> > > > +   }
> > > > +   return 0;
> > > > +}
> > > > +
> > > > +int ceph_restore_deleg_ino(struct ceph_mds_session *s, u64 ino)
> > > > +{
> > > > +   return xa_insert(&s->s_delegated_inos, ino, DELEGATED_INO_AVAILABLE,
> > > > +                    GFP_KERNEL);
> > > > +}
> > > > +#else /* BITS_PER_LONG == 64 */
> > > > +/*
> > > > + * FIXME: xarrays can't handle 64-bit indexes on a 32-bit arch. For now, just
> > > > + * ignore delegated_inos on 32 bit arch. Maybe eventually add xarrays for top
> > > > + * and bottom words?
> > > > + */
> > > > +static int ceph_parse_deleg_inos(void **p, void *end,
> > > > +                            struct ceph_mds_session *s)
> > > > +{
> > > > +   u32 sets;
> > > > +
> > > > +   ceph_decode_32_safe(p, end, sets, bad);
> > > > +   if (sets)
> > > > +           ceph_decode_skip_n(p, end, sets * 2 * sizeof(__le64), bad);
> > > > +   return 0;
> > > > +bad:
> > > > +   return -EIO;
> > > > +}
> > > > +
> > > > +u64 ceph_get_deleg_ino(struct ceph_mds_session *s)
> > > > +{
> > > > +   return 0;
> > > > +}
> > > > +
> > > > +int ceph_restore_deleg_ino(struct ceph_mds_session *s, u64 ino)
> > > > +{
> > > > +   return 0;
> > > > +}
> > > > +#endif /* BITS_PER_LONG == 64 */
> > > > +
> > > >  /*
> > > >   * parse create results
> > > >   */
> > > >  static int parse_reply_info_create(void **p, void *end,
> > > >                               struct ceph_mds_reply_info_parsed *info,
> > > > -                             u64 features)
> > > > +                             u64 features, struct ceph_mds_session *s)
> > > >  {
> > > > +   int ret;
> > > > +
> > > >     if (features == (u64)-1 ||
> > > >         (features & CEPH_FEATURE_REPLY_CREATE_INODE)) {
> > > > -           /* Malformed reply? */
> > > >             if (*p == end) {
> > > > +                   /* Malformed reply? */
> > > >                     info->has_create_ino = false;
> > > > -           } else {
> > > > +           } else if (test_bit(CEPHFS_FEATURE_DELEG_INO, &s->s_features)) {
> > > > +                   u8 struct_v, struct_compat;
> > > > +                   u32 len;
> > > > +
> > > >                     info->has_create_ino = true;
> > > > +                   ceph_decode_8_safe(p, end, struct_v, bad);
> > > > +                   ceph_decode_8_safe(p, end, struct_compat, bad);
> > > > +                   ceph_decode_32_safe(p, end, len, bad);
> > > > +                   ceph_decode_64_safe(p, end, info->ino, bad);
> > > 
> > > I've done a quick test in current 'testing' branch and it seems that it's
> > > currently broken.  A bisect identified this commit as 'bad' and it's
> > > failing at this point.
> > > 
> > > I'm running an old (a few weeks) 'master' vstart cluster, so I don't have
> > > the needed bits for using this DELEG_INO feature.  Running xfstest
> > > generic/001 results in:
> > > 
> > >    ceph: mds parse_reply err -5
> > >    ceph: mdsc_handle_reply got corrupt reply mds0(tid:9)
> > >    ...
> > > 
> > > s->s_features does include the CEPHFS_FEATURE_DELEG_INO bit set;
> > > 'features' is -1 (0xffffffffffffffff) and s->s_features is 0x3fff.  Maybe
> > > the issue is actually somewhere else (the cephfs feature handling code),
> > > but I'm still looking.
> > > 
> > 
> > From the patch that added this feature in userland ceph code (commit
> > 2bcf4b62643b5):
> > 
> > --- a/src/mds/cephfs_features.h
> > +++ b/src/mds/cephfs_features.h
> > @@ -32,6 +32,7 @@
> >  #define CEPHFS_FEATURE_LAZY_CAP_WANTED  11
> >  #define CEPHFS_FEATURE_MULTI_RECONNECT  12
> >  #define CEPHFS_FEATURE_NAUTILUS         12
> > +#define CEPHFS_FEATURE_DELEG_INO        13
> >  #define CEPHFS_FEATURE_OCTOPUS          13
> > 
> >  #define CEPHFS_FEATURES_ALL {          \
> > @@ -45,6 +46,7 @@
> >    CEPHFS_FEATURE_LAZY_CAP_WANTED,      \
> >    CEPHFS_FEATURE_MULTI_RECONNECT,      \
> >    CEPHFS_FEATURE_NAUTILUS,              \
> > +  CEPHFS_FEATURE_DELEG_INO,             \
> >    CEPHFS_FEATURE_OCTOPUS,               \
> >  }
> > 
> > ...this feature was added under the aegis of the
> > CEPHFS_FEATURE_DELEG_INO flag, but that bit is shared with
> > CEPHFS_FEATURE_OCTOPUS, which was already enabled in octopus before we
> > ever added it (back on April 1st 2019).
> > 
> > Any version of the MDS that has commit 49930ad8a3402 but does not have
> > 2bcf4b62643b5 will not work properly with newer kernels. Personally, I
> > don't see that as a problem per-se, as that should only be the case with
> > bleeding-edge MDS builds. Official releases should never see this issue.
> > 
> > Going forward, I think commit 49930ad8a3402 was probably a bad idea. We
> > really should not add "release" cephfs feature bits to the mask until
> > just before an official release, and should just make it alias the last
> > "real" feature bit. That should help ensure that we don't hit this
> > problem in the future.
> 
> We should avoid "release" feature bits altogether, as discussed in
> the last CDM.  They appeared because we were running out of free bits
> for RADOS features (a 64-bit field).  CephFS features are encoded in
> a bit vector that can grow as needed.
> 
> Feature masks for groups of related feature bits make sense, but they
> should be handled at a higher level.  E.g. a set of feature bits that
> a client should support in order for inline data to work properly that
> we flip to required if the user runs "ceph fs set <fsname> inline_data
> true".  The actual feature bits in the bit vector should all be
> distinct -- no overlaps.
> 

Yes, sorry, that was what we discussed, and that sounds fine to me.

The only reason those tags exist at all is that userland ceph has a
"min_compat_client" setting that disallows clients that don't support
release "X" from mounting.

I don't recall whether we had a plan to remove that setting from the
userland ceph code. Did we? It seems like if we're moving away from
declaring a particular release feature bit, then we should deprecate
min_compat_client.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux