On Thu, Aug 08, 2024 at 11:40:07AM GMT, Morten Hein Tiljeset wrote: > Hi folks, > > I'm trying to debug an issue that occurs sporadically in production where the > ext4 filesystem on a device, say dm-1, is never fully closed. This is visible > from userspace only via the existence of /sys/fs/ext4/dm-1 which cryptsetup > uses to determine that the device is still mounted. > > My initial thought was that it was mounted in some mount namespace, but this is > not the case. I've used a debugger (drgn) on /proc/kcore to find the > superblock. I can see that this is kept alive by a single mount which looks > like this (leaving out all fields that are NULL/empty lists): > > *(struct mount *)0xffff888af92c5cc0 = { > .mnt_parent = (struct mount *)0xffff888af92c5cc0, > .mnt_mountpoint = (struct dentry *)0xffff888850331980, // an application defined path > .mnt = (struct vfsmount){ > .mnt_root = (struct dentry *)0xffff888850331980, // note: same path as path as mnt_mountpoint > .mnt_sb = (struct super_block *)0xffff88a89f7bc800, // points to the superblock I want cleaned up > .mnt_flags = (int)134217760, // 0x8000020 = MNT_UMOUNT | MNT_RELATIME > .mnt_userns = (struct user_namespace *)init_user_ns+0x0 = 0xffffffffb384b400, > }, > .mnt_pcp = (struct mnt_pcp *)0x37dfbfa2c338, > .mnt_instance = (struct list_head){ > .next = (struct list_head *)0xffff88a89f7bc8d0, > .prev = (struct list_head *)0xffff88a89f7bc8d0, > }, > .mnt_devname = (const char *)0xffff88a7d0fe7cc0 = "/dev/mapper/<my device>_crypt", // maps to /dev/dm-1 > .mnt_id = (int)3605, > } That's the root mount of the filesystem here. > > In particular I notice that the mount namespace is NULL. As far as I understand > the only way to get this state is through a lazy unmount (MNT_DETACH). I can at > least manage to create a similar state by lazily unmounting but keeping the > mount alive with a shell with CWD inside the mountpoint. > > I've tried to search for the superblock pointer on cwd/root of all tasks, which > works in my synthetic example but not for the real case. I've had similar > results searching for the superblock pointer using drgn's fsrefs.py script[1] > which has support for searching additional kernel data structures. It's likely held alive by some random file descriptor someone has open. IOW, try and walk all /proc/<pid>/fd/<nr> in that case and see whether anything keeps it alive.