On Wed, 2023-03-08 at 17:14 +0000, Luís Henriques wrote: > Xiubo Li <xiubli@xxxxxxxxxx> writes: > > > On 08/03/2023 17:29, Luís Henriques wrote: > > > Xiubo Li <xiubli@xxxxxxxxxx> writes: > > > > > > > On 08/03/2023 02:53, Luís Henriques wrote: > > > > > xiubli@xxxxxxxxxx writes: > > > > > > > > > > > From: Jeff Layton <jlayton@xxxxxxxxxx> > > > > > > > > > > > > If we have a dentry which represents a no-key name, then we need to test > > > > > > whether the parent directory's encryption key has since been added. Do > > > > > > that before we test anything else about the dentry. > > > > > > > > > > > > Reviewed-by: Xiubo Li <xiubli@xxxxxxxxxx> > > > > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > > > > > --- > > > > > > fs/ceph/dir.c | 8 ++++++-- > > > > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c > > > > > > index d3c2853bb0f1..5ead9f59e693 100644 > > > > > > --- a/fs/ceph/dir.c > > > > > > +++ b/fs/ceph/dir.c > > > > > > @@ -1770,6 +1770,10 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags) > > > > > > struct inode *dir, *inode; > > > > > > struct ceph_mds_client *mdsc; > > > > > > + valid = fscrypt_d_revalidate(dentry, flags); > > > > > > + if (valid <= 0) > > > > > > + return valid; > > > > > > + > > > > > This patch has confused me in the past, and today I found myself > > > > > scratching my head again looking at it. > > > > > > > > > > So, I've started seeing generic/123 test failing when running it with > > > > > test_dummy_encryption. I was almost sure that this test used to run fine > > > > > before, but I couldn't find any evidence (somehow I lost my old testing > > > > > logs...). > > > > > > > > > > Anyway, the test is quite simple: > > > > > > > > > > 1. Creates a directory with write permissions for root only > > > > > 2. Writes into a file in that directory > > > > > 3. Uses 'su' to try to modify that file as a different user, and > > > > > gets -EPERM > > > > > > > > > > All these steps run fine, and the test should pass. *However*, in the > > > > > test cleanup function, a simple 'rm -rf <dir>' will fail with -ENOTEMPTY. > > > > > 'strace' shows that calling unlinkat() to remove the file got a '-ENOENT' > > > > > and then -ENOTEMPTY for the directory. > > > > > > > > > > Some digging allowed me to figure out that running commands with 'su' will > > > > > drop caches (I see 'su (874): drop_caches: 2' in the log). And this is > > > > > how I ended up looking at this patch. fscrypt_d_revalidate() will return > > > > > '0' if the parent directory does has a key (fscrypt_has_encryption_key()). > > > > > Can we really say here that the dentry is *not* valid in that case? Or > > > > > should that '<= 0' be a '< 0'? > > > > > > > > > > (But again, this patch has confused me before...) > > > > Luis, > > > > > > > > Could you reproduce it with the latest testing branch ? > > > Yes, I'm seeing this with the latest code. > > > > Okay. That's odd. > > > > BTW, are you using the non-root user to run the test ? > > > > Locally I am using the root user and still couldn't reproduce it. > > Yes, I'm running the tests as root but I've also 'fsgqa' user in the > system (which is used by this test. Anyway, for reference, here's what > I'm using in my fstests configuration: > > TEST_FS_MOUNT_OPTS="-o name=admin,secret=<key>,copyfrom,ms_mode=crc,test_dummy_encryption" > MOUNT_OPTIONS="-o name=admin,secret=<key>,copyfrom,ms_mode=crc,test_dummy_encryption" > > > > > > > > I never seen the generic/123 failure yet. And just now I ran the test for many > > > > times locally it worked fine. > > > That's odd. With 'test_dummy_encryption' mount option I can reproduce it > > > every time. > > > > > > > From the generic/123 test code it will never touch the key while testing, that > > > > means the dentries under the test dir will always have the keyed name. And then > > > > the 'fscrypt_d_revalidate()' should return 1 always. > > > > > > > > Only when we remove the key will it trigger evicting the inodes and then when we > > > > add the key back will the 'fscrypt_d_revalidate()' return 0 by checking the > > > > 'fscrypt_has_encryption_key()'. > > > > > > > > As I remembered we have one or more fixes about this those days, not sure > > > > whether you were hitting those bugs we have already fixed ? > > > Yeah, I remember now, and I guess there's yet another one here! > > > > > > I'll look closer into this and see if I can find out something else. I'm > > > definitely seeing 'fscrypt_d_revalidate()' returning 0, so probably the > > > bug is in the error paths, when the 'fsgqa' user tries to write into the > > > file. > > > > Please add some debug logs in the code. > > I *think* I've something. The problem seems to be that, after the > drop_caches, the test directory is evicted and ceph_evict_inode() will > call fscrypt_put_encryption_info(). This last function will clear the > inode fscrypt info. Later on, when the test tries to write to the file > with: > > _user_do "echo goo >> $my_test_subdir/data_coherency.txt" > > function ceph_atomic_open() will correctly identify that '$my_test_subdir' > is encrypted, but the key isn't set because the inode was evicted. This > means that fscrypt_has_encryption_key() will return '0' and DCACHE_NOKEY_NAME > will be *incorrectly* added to the 'data_coherency.txt' dentry flags. > > Later on, ceph_d_revalidate() will see the problem I initially described. > > The (RFC) patch bellow seems to fix the issue. Basically, it will force > the fscrypt info to be set in the directory by calling __fscrypt_prepare_readdir() > and the fscrypt_has_encryption_key() will then return 'true'. > > Cheers > -- > Luís > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > index dee3b445f415..3f2df84a6323 100644 > --- a/fs/ceph/file.c > +++ b/fs/ceph/file.c > @@ -795,7 +795,8 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry, > ihold(dir); > if (IS_ENCRYPTED(dir)) { > set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags); > - if (!fscrypt_has_encryption_key(dir)) { > + err = __fscrypt_prepare_readdir(dir); I want to say that i had something like this in place during an earlier version of this series, but for different reasons. I think I convinced myself later though that it wasn't needed? Oh well... > + if (err || (!err && !fscrypt_has_encryption_key(dir))) { > spin_lock(&dentry->d_lock); > dentry->d_flags |= DCACHE_NOKEY_NAME; > spin_unlock(&dentry->d_lock); Once an inode is evicted, my understanding was that it won't end up being used anymore. It's on its way out of the cache and it's not hashed anymore at that point. How does a new atomic open after drop_caches end up with the inode struct that existed before it? -- Jeff Layton <jlayton@xxxxxxxxxx>