Xiubo Li <xiubli@xxxxxxxxxx> writes: > On 3/17/22 11:45 PM, Luís Henriques wrote: >> When creating a snapshot, the .snap directories for every subdirectory will >> show the snapshot name in the "long format": >> >> # mkdir .snap/my-snap >> # ls my-dir/.snap/ >> _my-snap_1099511627782 >> >> Encrypted snapshots will need to be able to handle these snapshot names by >> encrypting/decrypting only the snapshot part of the string ('my-snap'). >> >> Also, since the MDS prevents snapshot names to be bigger than 240 characters >> it is necessary to adapt CEPH_NOHASH_NAME_MAX to accommodate this extra >> limitation. >> >> Signed-off-by: Luís Henriques <lhenriques@xxxxxxx> >> --- >> fs/ceph/crypto.c | 189 ++++++++++++++++++++++++++++++++++++++++------- >> fs/ceph/crypto.h | 11 ++- >> 2 files changed, 169 insertions(+), 31 deletions(-) >> >> diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c >> index beb73bbdd868..caa9863dee93 100644 >> --- a/fs/ceph/crypto.c >> +++ b/fs/ceph/crypto.c >> @@ -128,16 +128,100 @@ void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_se >> swap(req->r_fscrypt_auth, as->fscrypt_auth); >> } >> -int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr >> *d_name, char *buf) >> +/* >> + * User-created snapshots can't start with '_'. Snapshots that start with this >> + * character are special (hint: there aren't real snapshots) and use the >> + * following format: >> + * >> + * _<SNAPSHOT-NAME>_<INODE-NUMBER> >> + * >> + * where: >> + * - <SNAPSHOT-NAME> - the real snapshot name that may need to be decrypted, >> + * - <INODE-NUMBER> - the inode number for the actual snapshot >> + * >> + * This function parses these snapshot names and returns the inode >> + * <INODE-NUMBER>. 'name_len' will also bet set with the <SNAPSHOT-NAME> >> + * length. >> + */ >> +static struct inode *parse_longname(const struct inode *parent, const char *name, >> + int *name_len) >> { >> + struct inode *dir = NULL; >> + struct ceph_vino vino = { .snap = CEPH_NOSNAP }; >> + char *inode_number; >> + char *name_end; >> + int orig_len = *name_len; >> + int ret = -EIO; >> + >> + /* Skip initial '_' */ >> + name++; >> + name_end = strrchr(name, '_'); >> + if (!name_end) { >> + dout("Failed to parse long snapshot name: %s\n", name); >> + return ERR_PTR(-EIO); >> + } >> + *name_len = (name_end - name); >> + if (*name_len <= 0) { >> + pr_err("Failed to parse long snapshot name\n"); >> + return ERR_PTR(-EIO); >> + } >> + >> + /* Get the inode number */ >> + inode_number = kmemdup_nul(name_end + 1, >> + orig_len - *name_len - 2, >> + GFP_KERNEL); >> + if (!inode_number) >> + return ERR_PTR(-ENOMEM); >> + ret = kstrtou64(inode_number, 0, &vino.ino); >> + if (ret) { >> + dout("Failed to parse inode number: %s\n", name); >> + dir = ERR_PTR(ret); >> + goto out; >> + } >> + >> + /* And finally the inode */ >> + dir = ceph_find_inode(parent->i_sb, vino); >> + if (!dir) { >> + /* This can happen if we're not mounting cephfs on the root */ >> + dir = ceph_get_inode(parent->i_sb, vino, NULL); > > In this case IMO you should lookup the inode from MDS instead create it in the > cache, which won't setup the encryption info needed. > > So later when you try to use this to dencrypt the snapshot names, you will hit > errors ? And also the case Jeff mentioned in previous thread could happen. No, I don't see any errors. The reason is that if we get a I_NEW inode, we do not have the keys to even decrypt the names. If you mount a filesystem using as root a directory that is inside an encrypted directory, you'll see the encrypted snapshot name: # mkdir mydir # fscrypt encrypt mydir # mkdir -p mydir/a/b/c/d # mkdir mydir/a/.snap/myspan # umount ... # mount <mon>:<port>:/a # ls .snap And we simply can't decrypt it because for that we'd need to have access to the .fscrypt in the original filesystem mount root. I haven't tested NFS over ceph (I don't currently have a test environment for doing that), but I *think* the same thing will happen. (I can try to setup this test environment in the next couple of days.) > I figured out another approach could resolve this more gracefully: I took a quick look at the PR and the client patch but I suspect that Jeff is right: this approach may greatly reduce security, which is definitely not desirable. Cheers, -- Luís > For all the subdirs just let them inherit the encryption info from the same > ancestor, which is initially encrypted, then in ceph_new_inode() you can just > skip setting up the encryption info for all the subdirs and in MDS side will > send back the parent's encryption info and fill it in handle_reply(), this is > just what the .snap does. > > Then here you can use current inode to do the dencryption for all the snapshots > including the long snapshot names. > > I have raise one PR and send a kclient patch for the above basic framework > [1][2]. But there still need a little more work you need to do based them: > > When lssnap you need to add one flag in LeaseStat to tell the kclient whether > the long snap names are encrypted, this is very easy in MDS side. Then in > kclient side you can just skip dencrypting the long snap names which are from > none-encyrpted parents and for all the other just use current inode to do the > dencryption. No need to search the parent inodes for long snaps. > > And when lookuping a long snap name, which could be encyrpted and could be not, > then you need to parse the inode out and lookup the inode from MDS if it does > not exist in cache. > > > [1] https://github.com/ceph/ceph/pull/45516 > > [2] https://patchwork.kernel.org/project/ceph-devel/list/?series=624492 > > >> + if (!dir) >> + dir = ERR_PTR(-ENOENT); >> + } >> + if (IS_ERR(dir)) >> + dout("Can't find inode %s (%s)\n", inode_number, name); >> + >> +out: >> + kfree(inode_number); >> + return dir; >> +} >> + >> +int ceph_encode_encrypted_dname(struct inode *parent, struct qstr *d_name, char *buf) >> +{ >> + struct inode *dir = parent; >> + struct qstr iname; >> u32 len; >> + int name_len; >> int elen; >> int ret; >> - u8 *cryptbuf; >> + u8 *cryptbuf = NULL; >> + >> + iname.name = d_name->name; >> + name_len = d_name->len; >> + >> + /* Handle the special case of snapshot names that start with '_' */ >> + if ((ceph_snap(dir) == CEPH_SNAPDIR) && (name_len > 0) && >> + (iname.name[0] == '_')) { >> + dir = parse_longname(parent, iname.name, &name_len); >> + if (IS_ERR(dir)) >> + return PTR_ERR(dir); >> + iname.name++; /* skip initial '_' */ >> + } >> + iname.len = name_len; >> - if (!fscrypt_has_encryption_key(parent)) { >> + if (!fscrypt_has_encryption_key(dir)) { >> memcpy(buf, d_name->name, d_name->len); >> - return d_name->len; >> + elen = d_name->len; >> + goto out; >> } >> /* >> @@ -146,18 +230,22 @@ int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr *d_name, >> * >> * See: fscrypt_setup_filename >> */ >> - if (!fscrypt_fname_encrypted_size(parent, d_name->len, NAME_MAX, &len)) >> - return -ENAMETOOLONG; >> + if (!fscrypt_fname_encrypted_size(dir, iname.len, NAME_MAX, &len)) { >> + elen = -ENAMETOOLONG; >> + goto out; >> + } >> /* Allocate a buffer appropriate to hold the result */ >> cryptbuf = kmalloc(len > CEPH_NOHASH_NAME_MAX ? NAME_MAX : len, GFP_KERNEL); >> - if (!cryptbuf) >> - return -ENOMEM; >> + if (!cryptbuf) { >> + elen = -ENOMEM; >> + goto out; >> + } >> - ret = fscrypt_fname_encrypt(parent, d_name, cryptbuf, len); >> + ret = fscrypt_fname_encrypt(dir, &iname, cryptbuf, len); >> if (ret) { >> - kfree(cryptbuf); >> - return ret; >> + elen = ret; >> + goto out; >> } >> /* hash the end if the name is long enough */ >> @@ -173,12 +261,29 @@ int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr *d_name, >> /* base64 encode the encrypted name */ >> elen = fscrypt_base64url_encode(cryptbuf, len, buf); >> - kfree(cryptbuf); >> dout("base64-encoded ciphertext name = %.*s\n", elen, buf); >> + >> + WARN_ON(elen > (CEPH_NOHASH_NAME_MAX + SHA256_DIGEST_SIZE)); >> + if ((elen > 0) && (dir != parent)) { >> + char tmp_buf[NAME_MAX]; >> + >> + elen = snprintf(tmp_buf, sizeof(tmp_buf), "_%.*s_%ld", >> + elen, buf, dir->i_ino); >> + memcpy(buf, tmp_buf, elen); >> + } >> + >> +out: >> + kfree(cryptbuf); >> + if (dir != parent) { >> + if ((dir->i_state & I_NEW)) >> + discard_new_inode(dir); >> + else >> + iput(dir); >> + } >> return elen; >> } >> -int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry >> *dentry, char *buf) >> +int ceph_encode_encrypted_fname(struct inode *parent, struct dentry *dentry, char *buf) >> { >> WARN_ON_ONCE(!fscrypt_has_encryption_key(parent)); >> @@ -203,29 +308,42 @@ int ceph_encode_encrypted_fname(const struct inode >> *parent, struct dentry *dentr >> int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname, >> struct fscrypt_str *oname, bool *is_nokey) >> { >> - int ret; >> + struct inode *dir = fname->dir; >> struct fscrypt_str _tname = FSTR_INIT(NULL, 0); >> struct fscrypt_str iname; >> - >> - if (!IS_ENCRYPTED(fname->dir)) { >> - oname->name = fname->name; >> - oname->len = fname->name_len; >> - return 0; >> - } >> + char *name = fname->name; >> + int name_len = fname->name_len; >> + int ret; >> /* Sanity check that the resulting name will fit in the buffer */ >> if (fname->name_len > NAME_MAX || fname->ctext_len > NAME_MAX) >> return -EIO; >> - ret = __fscrypt_prepare_readdir(fname->dir); >> + /* Handle the special case of snapshot names that start with '_' */ >> + if ((ceph_snap(dir) == CEPH_SNAPDIR) && (name_len > 0) && >> + (name[0] == '_')) { >> + dir = parse_longname(dir, name, &name_len); >> + if (IS_ERR(dir)) >> + return PTR_ERR(dir); >> + name++; /* skip initial '_' */ >> + } >> + >> + if (!IS_ENCRYPTED(dir)) { >> + oname->name = fname->name; >> + oname->len = fname->name_len; >> + ret = 0; >> + goto out_inode; >> + } >> + >> + ret = __fscrypt_prepare_readdir(dir); >> if (ret) >> - return ret; >> + goto out_inode; >> /* >> * Use the raw dentry name as sent by the MDS instead of >> * generating a nokey name via fscrypt. >> */ >> - if (!fscrypt_has_encryption_key(fname->dir)) { >> + if (!fscrypt_has_encryption_key(dir)) { >> if (fname->no_copy) >> oname->name = fname->name; >> else >> @@ -233,7 +351,8 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname, >> oname->len = fname->name_len; >> if (is_nokey) >> *is_nokey = true; >> - return 0; >> + ret = 0; >> + goto out_inode; >> } >> if (fname->ctext_len == 0) { >> @@ -242,11 +361,11 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname, >> if (!tname) { >> ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname); >> if (ret) >> - return ret; >> + goto out_inode; >> tname = &_tname; >> } >> - declen = fscrypt_base64url_decode(fname->name, fname->name_len, >> tname->name); >> + declen = fscrypt_base64url_decode(name, name_len, tname->name); >> if (declen <= 0) { >> ret = -EIO; >> goto out; >> @@ -258,9 +377,25 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, struct fscrypt_str *tname, >> iname.len = fname->ctext_len; >> } >> - ret = fscrypt_fname_disk_to_usr(fname->dir, 0, 0, &iname, oname); >> + ret = fscrypt_fname_disk_to_usr(dir, 0, 0, &iname, oname); >> + if (!ret && (dir != fname->dir)) { >> + char tmp_buf[FSCRYPT_BASE64URL_CHARS(NAME_MAX)]; >> + >> + name_len = snprintf(tmp_buf, sizeof(tmp_buf), "_%.*s_%ld", >> + oname->len, oname->name, dir->i_ino); >> + memcpy(oname->name, tmp_buf, name_len); >> + oname->len = name_len; >> + } >> + >> out: >> fscrypt_fname_free_buffer(&_tname); >> +out_inode: >> + if ((dir != fname->dir) && !IS_ERR(dir)) { >> + if ((dir->i_state & I_NEW)) >> + discard_new_inode(dir); >> + else >> + iput(dir); >> + } >> return ret; >> } >> diff --git a/fs/ceph/crypto.h b/fs/ceph/crypto.h >> index 62f0ddd30dee..3273d076a9e5 100644 >> --- a/fs/ceph/crypto.h >> +++ b/fs/ceph/crypto.h >> @@ -82,13 +82,16 @@ static inline u32 ceph_fscrypt_auth_len(struct ceph_fscrypt_auth *fa) >> * struct fscrypt_ceph_nokey_name { >> * u8 bytes[157]; >> * u8 sha256[SHA256_DIGEST_SIZE]; >> - * }; // 189 bytes => 252 bytes base64-encoded, which is <= NAME_MAX (255) >> + * }; // 180 bytes => 240 bytes base64-encoded, which is <= NAME_MAX (255) >> + * >> + * (240 bytes is the maximum size allowed for snapshot names to take into >> + * account the format: '_<SNAPSHOT-NAME>_<INODE-NUMBER>'.) >> * >> * Note that for long names that end up having their tail portion hashed, we >> * must also store the full encrypted name (in the dentry's alternate_name >> * field). >> */ >> -#define CEPH_NOHASH_NAME_MAX (189 - SHA256_DIGEST_SIZE) >> +#define CEPH_NOHASH_NAME_MAX (180 - SHA256_DIGEST_SIZE) >> void ceph_fscrypt_set_ops(struct super_block *sb); >> @@ -97,8 +100,8 @@ void ceph_fscrypt_free_dummy_policy(struct ceph_fs_client >> *fsc); >> int ceph_fscrypt_prepare_context(struct inode *dir, struct inode *inode, >> struct ceph_acl_sec_ctx *as); >> void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_sec_ctx *as); >> -int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr *d_name, char *buf); >> -int ceph_encode_encrypted_fname(const struct inode *parent, struct dentry *dentry, char *buf); >> +int ceph_encode_encrypted_dname(struct inode *parent, struct qstr *d_name, char *buf); >> +int ceph_encode_encrypted_fname(struct inode *parent, struct dentry *dentry, char *buf); >> static inline int ceph_fname_alloc_buffer(struct inode *parent, struct >> fscrypt_str *fname) >> { >> >