Xiubo Li <xiubli@xxxxxxxxxx> writes: > On 6/2/22 5:26 PM, Luís Henriques wrote: >> Xiubo Li <xiubli@xxxxxxxxxx> writes: >> >>> On 6/2/22 12:29 AM, Luís Henriques wrote: >>>> The MDS tries to enforce a limit on the total key/values in extended >>>> attributes. However, this limit is enforced only if doing a synchronous >>>> operation (MDS_OP_SETXATTR) -- if we're buffering the xattrs, the MDS >>>> doesn't have a chance to enforce these limits. >>>> >>>> This patch adds support for decoding the xattrs maximum size setting that is >>>> distributed in the mdsmap. Then, when setting an xattr, the kernel client >>>> will revert to do a synchronous operation if that maximum size is exceeded. >>>> >>>> While there, fix a dout() that would trigger a printk warning: >>>> >>>> [ 98.718078] ------------[ cut here ]------------ >>>> [ 98.719012] precision 65536 too large >>>> [ 98.719039] WARNING: CPU: 1 PID: 3755 at lib/vsprintf.c:2703 vsnprintf+0x5e3/0x600 >>>> ... >>>> >>>> URL: https://tracker.ceph.com/issues/55725 >>>> Signed-off-by: Luís Henriques <lhenriques@xxxxxxx> >>>> --- >>>> fs/ceph/mdsmap.c | 27 +++++++++++++++++++++++---- >>>> fs/ceph/xattr.c | 12 ++++++++---- >>>> include/linux/ceph/mdsmap.h | 1 + >>>> 3 files changed, 32 insertions(+), 8 deletions(-) >>>> >>>> * Changes since v2 >>>> >>>> Well, a lot has changed since v2! Now the xattr max value setting is >>>> obtained through the mdsmap, which needs to be decoded, and the feature >>>> that was used in the previous revision was dropped. The drawback is that >>>> the MDS isn't unable to know in advance if a client is aware of this xattr >>>> max value. >>>> >>>> * Changes since v1 >>>> >>>> Added support for new feature bit to get the MDS max_xattr_pairs_size >>>> setting. >>>> >>>> Also note that this patch relies on a patch that hasn't been merged yet >>>> ("ceph: use correct index when encoding client supported features"), >>>> otherwise the new feature bit won't be correctly encoded. >>>> >>>> diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c >>>> index 30387733765d..36b2bc18ca2a 100644 >>>> --- a/fs/ceph/mdsmap.c >>>> +++ b/fs/ceph/mdsmap.c >>>> @@ -13,6 +13,12 @@ >>>> #include "super.h" >>>> +/* >>>> + * Maximum size of xattrs the MDS can handle per inode by default. This >>>> + * includes the attribute name and 4+4 bytes for the key/value sizes. >>>> + */ >>>> +#define MDS_MAX_XATTR_SIZE (1<<16) /* 64K */ >>>> + >>>> #define CEPH_MDS_IS_READY(i, ignore_laggy) \ >>>> (m->m_info[i].state > 0 && ignore_laggy ? true : !m->m_info[i].laggy) >>>> @@ -352,12 +358,10 @@ struct ceph_mdsmap *ceph_mdsmap_decode(void **p, void >>>> *end, bool msgr2) >>>> __decode_and_drop_type(p, end, u8, bad_ext); >>>> } >>>> if (mdsmap_ev >= 8) { >>>> - u32 name_len; >>>> /* enabled */ >>>> ceph_decode_8_safe(p, end, m->m_enabled, bad_ext); >>>> - ceph_decode_32_safe(p, end, name_len, bad_ext); >>>> - ceph_decode_need(p, end, name_len, bad_ext); >>>> - *p += name_len; >>>> + /* fs_name */ >>>> + ceph_decode_skip_string(p, end, bad_ext); >>>> } >>>> /* damaged */ >>>> if (mdsmap_ev >= 9) { >>>> @@ -370,6 +374,21 @@ struct ceph_mdsmap *ceph_mdsmap_decode(void **p, void *end, bool msgr2) >>>> } else { >>>> m->m_damaged = false; >>>> } >>>> + if (mdsmap_ev >= 17) { >>>> + /* balancer */ >>>> + ceph_decode_skip_string(p, end, bad_ext); >>>> + /* standby_count_wanted */ >>>> + ceph_decode_skip_32(p, end, bad_ext); >>>> + /* old_max_mds */ >>>> + ceph_decode_skip_32(p, end, bad_ext); >>>> + /* min_compat_client */ >>>> + ceph_decode_skip_8(p, end, bad_ext); >>> This is incorrect. >>> >>> If mdsmap_ev == 15 the min_compat_client will be a feature_bitset_t instead of >>> int8_t. >> Hmm... can you point me at where that's done in the code? As usual, I'm >> confused with that code and simply can't see that. >> >> Also, if that happens only when mdsmap_ev == 15, then there's no problem >> because that branch is only taken if it's >= 17. > > Yeah, so you should skip 32 or 32+64 bits instead here, just likes: > > 3536 /* version >= 3, feature bits */ > 3537 ceph_decode_32_safe(&p, end, len, bad); > 3538 if (len) { > 3539 ceph_decode_64_safe(&p, end, features, bad); > 3540 p += len - sizeof(features); > 3541 } > > For the ceph code please see: > > Please see https://github.com/ceph/ceph/blob/main/src/mds/MDSMap.cc#L925. I still don't see what your saying. From what I understand, with <= 15 we used to have 'min_compat_client', which is of type 'ceph_release_t', defined in src/common/ceph_releases.h: enum class ceph_release_t : std::uint8_t { ... } Then, starting with >= 16 the MDS ignores this 'min_compat_client' field (but still encodes/decodes it), and it *adds* 'required_client_features', which is a 'feature_bitset_t' and that is decoded immediately after (see bellow, the ceph_decode_skip_set() call). Cheers, -- Luís >>> >>>> + /* required_client_features */ >>>> + ceph_decode_skip_set(p, end, 64, bad_ext); >>>> + ceph_decode_64_safe(p, end, m->m_max_xattr_size, bad_ext); >>>> + } else { >>>> + m->m_max_xattr_size = MDS_MAX_XATTR_SIZE; >>>> + } >>>> bad_ext: >>>> dout("mdsmap_decode m_enabled: %d, m_damaged: %d, m_num_laggy: %d\n", >>>> !!m->m_enabled, !!m->m_damaged, m->m_num_laggy); >>>> diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c >>>> index 8c2dc2c762a4..67f046dac35c 100644 >>>> --- a/fs/ceph/xattr.c >>>> +++ b/fs/ceph/xattr.c >>>> @@ -1086,7 +1086,7 @@ static int ceph_sync_setxattr(struct inode *inode, const char *name, >>>> flags |= CEPH_XATTR_REMOVE; >>>> } >>>> - dout("setxattr value=%.*s\n", (int)size, value); >>>> + dout("setxattr value size: %ld\n", size); >>>> /* do request */ >>>> req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS); >>>> @@ -1184,8 +1184,14 @@ int __ceph_setxattr(struct inode *inode, const char *name, >>>> spin_lock(&ci->i_ceph_lock); >>>> retry: >>>> issued = __ceph_caps_issued(ci, NULL); >>>> - if (ci->i_xattrs.version == 0 || !(issued & CEPH_CAP_XATTR_EXCL)) >>>> + required_blob_size = __get_required_blob_size(ci, name_len, val_len); >>>> + if ((ci->i_xattrs.version == 0) || !(issued & CEPH_CAP_XATTR_EXCL) || >>>> + (required_blob_size >= mdsc->mdsmap->m_max_xattr_size)) { >>> Shouldn't it be '>' instead ? >> Ok, I'll fix that. >> >>> We'd better always force to do a sync request with old ceph. Just check if the >>> mdsmap_ev < 17. It's not safe to buffer it because it maybe discarded as your >>> ceph PR does. >> Right, that can be done. So, I can simply set the m_max_xattr_size to '0' >> if mdsmap_ev < 17. Then, this 'if' condition will always be evaluated to >> true because required_blob_size will be > 0. Does that sound OK? > > Yeah, sounds good. > > -- Xiubo > > >> >> Cheers, >