On Fri, Aug 2, 2024 at 12:44 AM Yifei Liu wrote: > > Dear NILFS2 Maintainers, > > I hope this message finds you well. I am writing to report a potential > bug we have encountered in NILFS2 related to disk space management > while testing it with our model checking tool, Metis. The issue arises > after performing the following operations: > > Steps to Reproduce: > 1. Mount the NILFS2 file system. > 2. Continuously create files in the NILFS2 file system until the disk > space is completely used up (ENOSPC). > 3. Delete all the files created in the previous step. > 4. Sleep for 1 minute to allow the cleanerd to run. > 5. Repeat steps 2-4 a few times. > > Note: The protection_period parameter in nilfs_cleanerd.conf has been > changed from the default 3600 seconds to 10 seconds for quicker > observation of the bug. > > Expected Behavior: After deleting all files, the disk usage should > decrease to zero or near zero, reflecting the freed space. > > Observed Behavior: Occasionally, after deleting the files, the file > system remains stuck at a high usage (88% or 100% in our experiments) > and does not free any space. When we try to create another file, it > fails and reports "no space left on the device". We also tried > manually running the cleanerd once the system’s space usage was stuck > at high percentages; even though some of the segments appear to be not > protected and have 0% live blocks, according to the lssu output, the > space was still not cleaned. This issue occurs sporadically and is not > consistent across all tests (thus, we suspect it may be a race > condition). > > We have created a GitHub repository containing a detailed README, the > script used to generate this problem, an example log generated in one > of our experiments, and the necessary files. Running this script and > obtaining all the outputs takes approximately 10 minutes. The script > sets up a ramdisk and mounts NILFS2 with the minimum possible size of > 1028 KiB. Here is the link to the GitHub repository: > https://github.com/sbu-fsl/nilfs2-full-space.git. > > I would appreciate any insights or assistance you could provide > regarding this issue. If you require any further information, logs, or > specific test cases, please let me know, and I will be happy to > provide them. > > Best regards, > > Yifei Liu > File systems and Storage Lab (Stony Brook University) Hi Yifei, I checked what your script was doing, and one thing I noticed was that nilfs_cleanerd seemed to be started twice. nilfs_cleanerd is designed to be automatically started via the mount.nilfs2 helper program when you mount a device with the mount command, and to be shut down via the umount.nilfs2 helper program before actually issuing the unmount system call when you try to unmount a device with the umount command. Basically, this program is designed to be a resident program that runs in the background while the device is mounted. In your script, you run nilfs_cleanerd manually after mounting and writing, so at this point, it seems that there are two nilfs_cleanerd processes, and both of them are requesting GC on the same device. If that happens, it will prevent fatal situations that would cause FS destruction, but normal operation is not guaranteed regarding GC. So, could you please check the existing processes with the ps command? If you start it via the mount command, it should not be started twice for the same device. If you want to run GC manually, use the "nilfs-clean" command to activate nilfs_cleanerd as follows: # nilfs-clean -p 0 $DEVICE If you really want to run nilfs_cleanerd manually, specify "nogc" mount option when mounting: # mount -o nogc $DEVICE $MOUNT_POINT In this case, you need to manually kill nilfs_cleanerd when unmounting. Depending on your environment, you may need to specify the file system manually: # mount -t nilfs2 -o nogc $DEVICE $MOUNT_POINT Also, the version of nilfs-utils used is old, so in order to isolate known bugs, it would be helpful if you could use the latest version of nilfs-utils-2.2.11 (or nilfs-utils 2.3.0-dev) for testing. You can download the latest version tarball from the site [1] or from github as described in [2]. [1] https://nilfs.sourceforge.io/en/download.html [2] https://nilfs.sourceforge.io/en/git_repos.html Thank you. Ryusuke Konishi