On Thu, 14 Feb 2019, Darrick J. Wong wrote: > [cc the shmem maintainer and the mm list] Yup, thanks - Matej also did so the day after sending to linux-kernel. > > On Thu, Feb 14, 2019 at 03:44:02PM -0800, Andrew Morton wrote: > > (cc linux-fsdevel) Okay, thanks, but a tmpfs peculiarity we think. > > > > On Mon, 11 Feb 2019 15:18:11 +0100 Matej Kupljen <matej.kupljen@xxxxxxxxx> wrote: > > > > > Hi, > > > > > > it seems that when opening file on file system that is mounted on > > > tmpfs with the O_TMPFILE flag and using linkat call after that, it > > > uses 2 inodes instead of 1. > > > > > > This is simple test case: > > > > > > #include <sys/types.h> > > > #include <sys/stat.h> > > > #include <fcntl.h> > > > #include <unistd.h> > > > #include <string.h> > > > #include <stdio.h> > > > #include <stdlib.h> > > > #include <linux/limits.h> > > > #include <errno.h> > > > > > > #define TEST_STRING "Testing\n" > > > > > > #define TMP_PATH "/tmp/ping/" > > > #define TMP_FILE "file.txt" > > > > > > > > > int main(int argc, char* argv[]) > > > { > > > char path[PATH_MAX]; > > > int fd; > > > int rc; > > > > > > fd = open(TMP_PATH, __O_TMPFILE | O_RDWR, > > > S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | > > > S_IROTH | S_IWOTH); > > > > > > rc = write(fd, TEST_STRING, strlen(TEST_STRING)); > > > > > > snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd); > > > linkat(AT_FDCWD, path, AT_FDCWD, TMP_PATH TMP_FILE, AT_SYMLINK_FOLLOW); > > > close(fd); > > > > > > return 0; > > > } > > > > > > I have checked indoes with "df -i" tool. The first inode is used when > > > the call to open is executed and the second one when the call to > > > linkat is executed. > > > It is not decreased when close is executed. > > > > > > I have also tested this on an ext4 mounted fs and there only one inode is used. > > > > > > I tested this on: > > > $ cat /etc/lsb-release > > > DISTRIB_ID=Ubuntu > > > DISTRIB_RELEASE=18.04 > > > DISTRIB_CODENAME=bionic > > > DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS" > > > > > > $ uname -a > > > Linux Orion 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC > > > 2018 x86_64 x86_64 x86_64 GNU/Linux > > Heh, tmpfs and its weird behavior where each new link counts as a new > inode because "each new link needs a new dentry, pinning lowmem, and > tmpfs dentries cannot be pruned until they are unlinked." That's very much a peculiarity of tmpfs, so agreed: it's what I expect to be the cause, but I've not actually tracked it through and fixed yet. > > It seems to have this behavior on 5.0-rc6 too: Yes, it does. > > $ /bin/df -i /tmp ; ./c ; /bin/df -i /tmp > Filesystem Inodes IUsed IFree IUse% Mounted on > tmp 1019110 17 1019093 1% /tmp > Filesystem Inodes IUsed IFree IUse% Mounted on > tmp 1019110 19 1019091 1% /tmp > > Probably because shmem_tmpfile -> shmem_get_inode -> shmem_reserve_inode > which decrements ifree when we create the tmpfile, and then the > d_tmpfile decrements i_nlink to zero. Now we have iused=1, nlink=0, > assuming iused=itotal-ifree like usual. > > Then the linkat call does: > > shmem_link -> shmem_reserve_inode > > which decrements ifree again and increments i_nlink to 1. Now we have > iused=2, nlink=1. > > The program exits, which closes the file. /tmp/ping/file.txt still > exists and we haven't evicted inodes yet, so nothing much happens. > > But then I added in rm -rf /tmp/ping/file.txt to see what happens. > shmem_unlink contains this: > > if (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode)) > shmem_free_inode(inode->i_sb); > > So shmem_iunlink *doesnt* decrement ifree but does drop the nlink, so > our state is now iused=2, nlink=0. > > Now we evict the inode, which decrements ifree, so iused=1 and the inode > goes away. Oops, we just leaked an ifree. > > I /think/ the proper fix is to change shmem_link to decrement ifree only > if the inode has nonzero nlink, e.g. > > /* > * No ordinary (disk based) filesystem counts links as inodes; > * but each new link needs a new dentry, pinning lowmem, and > * tmpfs dentries cannot be pruned until they are unlinked. If > * we're linking an O_TMPFILE file into the tmpfs we can skip > * this because there's still only one link to the inode. > */ > if (inode->i_nlink > 0) { > ret = shmem_reserve_inode(inode->i_sb); > if (ret) > goto out; > } > > Says me who was crawling around poking at O_TMPFILE behavior all morning. Thanks for the Cc on that patch: I thought at first that you were coincidentally fixing up Matej's observation, but from its commit message no. That work is just a generic cleanup to suit XFS needs, and won't change the tmpfs behavior one way or the other. > Not sure if that's right; what happens to the old dentry? I'm relieved to see your "/think/" above and "Not sure" there :) Me too. It is so easy to get these counting things wrong, especially when distributed between the generic and the specific file system. I'm not going to attempt a pronouncement until I've had time to sink properly into it at the weekend, when I'll follow your guide and work it through - thanks a lot for getting this far, Darrick. Hugh > > --D > > > > If you need any more information, please let me know. > > > > > > And please CC me when replying, I am not subscribed to the list. > > > > > > Thanks and BR, > > > Matej