On Fri, Mar 29, 2019 at 10:02 AM koishi komeiji <maykagura@xxxxxxxxx> wrote: >> I imagine that if you restart container or just drop caches problem >> goes away? > > I try to drop caches, the problem goes away immediately. > So I think the problem is about dentry and inode cache. > > # ls -l | grep invalid > ls: cannot read symbolic link librmifm.so: Invalid argument > ls: cannot read symbolic link libdrv_vlan.so: Invalid argument > # > # echo 2 > /proc/sys/vm/drop_caches > # > # ls -l | grep invalid > # > > > >> Do you know if the symlink was just recently created or if it existed before >> container has been started? > > The symlink in the image is ok. when the container start, the symlink > is decompressed from a cpio file.In this time ,it is still ok. Just > after some time, it break suddenly. > > Koishi, I have a theory, but it is not so easy to test. I am not sure if you are able to compile and install a new overlayfs module. Attached is un-tested patch to solve the speculated problem. This problem should not exist in stable kernel >= v4.14, so if you can update the host kernel that would be the best option for you. The problem described in the patch was solved in upstream kernel by commits: 09d8b586731b ovl: move __upperdentry to ovl_inode 31747eda41ef ovl: hash directory inodes for fsnotify But those depend on many other changes, so cannot be easily backported to kernel v4.9. What I speculate that happens is: 1. Some ovl dentry holds a reference on upper dentry 2. upper dentry hold a ref on upper NON-symlink inode 3. overlay inode->i_private point to upper inode, but does not hold a refcount to upper inode 4. overlay inode is hashed by value of upper inode pointer 5. At some point, overlay dentry, upper dentry and upper inode are dropped, but overlay inode remains with elevated refcount (maybe by inotify) and hashed 6. A new lookup of the symlink allocates a new upper inode struct and reuses the address of NON-symlink free upper inode 7. ovl lookup finds the old ovl inode in cache by upper inode address, but the found overlay inode is not a symlink 8. The new ovl dentry is !d_is_symlink, although stat(2) will show i_mode from real inode that is a symlink 9. readlink(2) of ovl dentry will also fail with -EINVAL 10. open(2) on ovl dentry will not try to follow symlink and will eventually fail to open the symlink itself with -ELOOP Thanks, Amir.
From e163516be9f1b82f070b114b852a2d659c755fb2 Mon Sep 17 00:00:00 2001 From: Amir Goldstein <amir73il@xxxxxxxxx> Date: Fri, 29 Mar 2019 14:16:42 +0300 Subject: [PATCH] ovl: grab a reference on real inode The overlay inode->i_private pointer references the real inode. Normally, the real inode refcount is held by the overlay real dentry reference. However, if overlay inode is pinned, for example by inotify watch and caches are dropped, the overlay dentry goes away with the elevated refcounts and then inode->i_private may be left with a dangling referece. If said overlay inode is also upper and hashed (by dead real inode pointer) a new lookup with new real inode that has the same address as old real inode may find the bad overlay inode in cache. Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> --- fs/overlayfs/inode.c | 5 ++++- fs/overlayfs/super.c | 12 +++++++++--- 2 files changed, 13 insertions(+), 4 deletions(-) diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c index 16f6db88c8e5..734230a5244e 100644 --- a/fs/overlayfs/inode.c +++ b/fs/overlayfs/inode.c @@ -416,7 +416,10 @@ static int ovl_inode_test(struct inode *inode, void *data) static int ovl_inode_set(struct inode *inode, void *data) { - inode->i_private = (void *) (((unsigned long) data) | OVL_ISUPPER_MASK); + struct inode *realinode = data; + + inode->i_private = (void *) (((unsigned long) igrab(realinode)) | + OVL_ISUPPER_MASK); return 0; } diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index e7c8ac41e288..75500928bba6 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -148,7 +148,7 @@ struct dentry *ovl_dentry_real(struct dentry *dentry) static void ovl_inode_init(struct inode *inode, struct inode *realinode, bool is_upper) { - WRITE_ONCE(inode->i_private, (unsigned long) realinode | + WRITE_ONCE(inode->i_private, (unsigned long) igrab(realinode) | (is_upper ? OVL_ISUPPER_MASK : 0)); } @@ -234,7 +234,7 @@ void ovl_inode_update(struct inode *inode, struct inode *upperinode) WARN_ON(!upperinode); WARN_ON(!inode_unhashed(inode)); WRITE_ONCE(inode->i_private, - (unsigned long) upperinode | OVL_ISUPPER_MASK); + (unsigned long) igrab(upperinode) | OVL_ISUPPER_MASK); if (!S_ISDIR(upperinode->i_mode)) __insert_inode_hash(inode, (unsigned long) upperinode); } @@ -687,12 +687,18 @@ static int ovl_remount(struct super_block *sb, int *flags, char *data) return 0; } +static int ovl_drop_inode(struct inode *inode) +{ + iput(ovl_inode_real(inode)); + return 1; +} + static const struct super_operations ovl_super_operations = { .put_super = ovl_put_super, .statfs = ovl_statfs, .show_options = ovl_show_options, .remount_fs = ovl_remount, - .drop_inode = generic_delete_inode, + .drop_inode = ovl_drop_inode, }; enum { -- 2.7.4