Jeff Layton <jlayton@xxxxxxxxxx> writes: > On Wed, 2023-03-08 at 17:14 +0000, Luís Henriques wrote: >> Xiubo Li <xiubli@xxxxxxxxxx> writes: >> >> > On 08/03/2023 17:29, Luís Henriques wrote: >> > > Xiubo Li <xiubli@xxxxxxxxxx> writes: >> > > >> > > > On 08/03/2023 02:53, Luís Henriques wrote: >> > > > > xiubli@xxxxxxxxxx writes: >> > > > > >> > > > > > From: Jeff Layton <jlayton@xxxxxxxxxx> >> > > > > > >> > > > > > If we have a dentry which represents a no-key name, then we need to test >> > > > > > whether the parent directory's encryption key has since been added. Do >> > > > > > that before we test anything else about the dentry. >> > > > > > >> > > > > > Reviewed-by: Xiubo Li <xiubli@xxxxxxxxxx> >> > > > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> >> > > > > > --- >> > > > > > fs/ceph/dir.c | 8 ++++++-- >> > > > > > 1 file changed, 6 insertions(+), 2 deletions(-) >> > > > > > >> > > > > > diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c >> > > > > > index d3c2853bb0f1..5ead9f59e693 100644 >> > > > > > --- a/fs/ceph/dir.c >> > > > > > +++ b/fs/ceph/dir.c >> > > > > > @@ -1770,6 +1770,10 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags) >> > > > > > struct inode *dir, *inode; >> > > > > > struct ceph_mds_client *mdsc; >> > > > > > + valid = fscrypt_d_revalidate(dentry, flags); >> > > > > > + if (valid <= 0) >> > > > > > + return valid; >> > > > > > + >> > > > > This patch has confused me in the past, and today I found myself >> > > > > scratching my head again looking at it. >> > > > > >> > > > > So, I've started seeing generic/123 test failing when running it with >> > > > > test_dummy_encryption. I was almost sure that this test used to run fine >> > > > > before, but I couldn't find any evidence (somehow I lost my old testing >> > > > > logs...). >> > > > > >> > > > > Anyway, the test is quite simple: >> > > > > >> > > > > 1. Creates a directory with write permissions for root only >> > > > > 2. Writes into a file in that directory >> > > > > 3. Uses 'su' to try to modify that file as a different user, and >> > > > > gets -EPERM >> > > > > >> > > > > All these steps run fine, and the test should pass. *However*, in the >> > > > > test cleanup function, a simple 'rm -rf <dir>' will fail with -ENOTEMPTY. >> > > > > 'strace' shows that calling unlinkat() to remove the file got a '-ENOENT' >> > > > > and then -ENOTEMPTY for the directory. >> > > > > >> > > > > Some digging allowed me to figure out that running commands with 'su' will >> > > > > drop caches (I see 'su (874): drop_caches: 2' in the log). And this is >> > > > > how I ended up looking at this patch. fscrypt_d_revalidate() will return >> > > > > '0' if the parent directory does has a key (fscrypt_has_encryption_key()). >> > > > > Can we really say here that the dentry is *not* valid in that case? Or >> > > > > should that '<= 0' be a '< 0'? >> > > > > >> > > > > (But again, this patch has confused me before...) >> > > > Luis, >> > > > >> > > > Could you reproduce it with the latest testing branch ? >> > > Yes, I'm seeing this with the latest code. >> > >> > Okay. That's odd. >> > >> > BTW, are you using the non-root user to run the test ? >> > >> > Locally I am using the root user and still couldn't reproduce it. >> >> Yes, I'm running the tests as root but I've also 'fsgqa' user in the >> system (which is used by this test. Anyway, for reference, here's what >> I'm using in my fstests configuration: >> >> TEST_FS_MOUNT_OPTS="-o name=admin,secret=<key>,copyfrom,ms_mode=crc,test_dummy_encryption" >> MOUNT_OPTIONS="-o name=admin,secret=<key>,copyfrom,ms_mode=crc,test_dummy_encryption" >> >> > > >> > > > I never seen the generic/123 failure yet. And just now I ran the test for many >> > > > times locally it worked fine. >> > > That's odd. With 'test_dummy_encryption' mount option I can reproduce it >> > > every time. >> > > >> > > > From the generic/123 test code it will never touch the key while testing, that >> > > > means the dentries under the test dir will always have the keyed name. And then >> > > > the 'fscrypt_d_revalidate()' should return 1 always. >> > > > >> > > > Only when we remove the key will it trigger evicting the inodes and then when we >> > > > add the key back will the 'fscrypt_d_revalidate()' return 0 by checking the >> > > > 'fscrypt_has_encryption_key()'. >> > > > >> > > > As I remembered we have one or more fixes about this those days, not sure >> > > > whether you were hitting those bugs we have already fixed ? >> > > Yeah, I remember now, and I guess there's yet another one here! >> > > >> > > I'll look closer into this and see if I can find out something else. I'm >> > > definitely seeing 'fscrypt_d_revalidate()' returning 0, so probably the >> > > bug is in the error paths, when the 'fsgqa' user tries to write into the >> > > file. >> > >> > Please add some debug logs in the code. >> >> I *think* I've something. The problem seems to be that, after the >> drop_caches, the test directory is evicted and ceph_evict_inode() will >> call fscrypt_put_encryption_info(). This last function will clear the >> inode fscrypt info. Later on, when the test tries to write to the file >> with: >> >> _user_do "echo goo >> $my_test_subdir/data_coherency.txt" >> >> function ceph_atomic_open() will correctly identify that '$my_test_subdir' >> is encrypted, but the key isn't set because the inode was evicted. This >> means that fscrypt_has_encryption_key() will return '0' and DCACHE_NOKEY_NAME >> will be *incorrectly* added to the 'data_coherency.txt' dentry flags. >> >> Later on, ceph_d_revalidate() will see the problem I initially described. >> >> The (RFC) patch bellow seems to fix the issue. Basically, it will force >> the fscrypt info to be set in the directory by calling __fscrypt_prepare_readdir() >> and the fscrypt_has_encryption_key() will then return 'true'. >> > > >> Cheers >> -- >> Luís >> >> diff --git a/fs/ceph/file.c b/fs/ceph/file.c >> index dee3b445f415..3f2df84a6323 100644 >> --- a/fs/ceph/file.c >> +++ b/fs/ceph/file.c >> @@ -795,7 +795,8 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry, >> ihold(dir); >> if (IS_ENCRYPTED(dir)) { >> set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags); >> - if (!fscrypt_has_encryption_key(dir)) { >> + err = __fscrypt_prepare_readdir(dir); > > I want to say that i had something like this in place during an earlier > version of this series, but for different reasons. I think I convinced > myself later though that it wasn't needed? Oh well... Ah, good to know it _may_ make sense :-) >> + if (err || (!err && !fscrypt_has_encryption_key(dir))) { >> spin_lock(&dentry->d_lock); >> dentry->d_flags |= DCACHE_NOKEY_NAME; >> spin_unlock(&dentry->d_lock); > > Once an inode is evicted, my understanding was that it won't end up > being used anymore. It's on its way out of the cache and it's not hashed > anymore at that point. > > How does a new atomic open after drop_caches end up with the inode > struct that existed before it? Hmm... so, I *think* that what's happening is that it is a new inode but the key is still available. Looking at the code it seems that fscrypt will get the context (->get_context()) from ceph code and then fscrypt_setup_encryption_info() should initialize everything in the inode. And at that point fscrypt_has_encryption_key() will finally return 'true'. Does this make sense? Cheers, -- Luís