Re: Potential Bug in NILFS2: Disk Space Not Freed After File Deletion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 2, 2024 at 12:44 AM Yifei Liu  wrote:
>
> Dear NILFS2 Maintainers,
>
> I hope this message finds you well. I am writing to report a potential
> bug we have encountered in NILFS2 related to disk space management
> while testing it with our model checking tool, Metis. The issue arises
> after performing the following operations:
>
> Steps to Reproduce:
> 1. Mount the NILFS2 file system.
> 2. Continuously create files in the NILFS2 file system until the disk
> space is completely used up (ENOSPC).
> 3. Delete all the files created in the previous step.
> 4. Sleep for 1 minute to allow the cleanerd to run.
> 5. Repeat steps 2-4 a few times.
>
> Note: The protection_period parameter in nilfs_cleanerd.conf has been
> changed from the default 3600 seconds to 10 seconds for quicker
> observation of the bug.
>
> Expected Behavior: After deleting all files, the disk usage should
> decrease to zero or near zero, reflecting the freed space.
>
> Observed Behavior: Occasionally, after deleting the files, the file
> system remains stuck at a high usage (88% or 100% in our experiments)
> and does not free any space. When we try to create another file, it
> fails and reports "no space left on the device". We also tried
> manually running the cleanerd once the system’s space usage was stuck
> at high percentages; even though some of the segments appear to be not
> protected and have 0% live blocks, according to the lssu output, the
> space was still not cleaned. This issue occurs sporadically and is not
> consistent across all tests (thus, we suspect it may be a race
> condition).
>
> We have created a GitHub repository containing a detailed README, the
> script used to generate this problem, an example log generated in one
> of our experiments, and the necessary files. Running this script and
> obtaining all the outputs takes approximately 10 minutes. The script
> sets up a ramdisk and mounts NILFS2 with the minimum possible size of
> 1028 KiB. Here is the link to the GitHub repository:
> https://github.com/sbu-fsl/nilfs2-full-space.git.
>
> I would appreciate any insights or assistance you could provide
> regarding this issue. If you require any further information, logs, or
> specific test cases, please let me know, and I will be happy to
> provide them.
>
> Best regards,
>
> Yifei Liu
> File systems and Storage Lab (Stony Brook University)

Hi Yifei,

I checked what your script was doing, and one thing I noticed was that
nilfs_cleanerd seemed to be started twice.

nilfs_cleanerd is designed to be automatically started via the
mount.nilfs2 helper program when you mount a device with the mount
command, and to be shut down via the umount.nilfs2 helper program
before actually issuing the unmount system call when you try to
unmount a device with the umount command.

Basically, this program is designed to be a resident program that runs
in the background while the device is mounted.

In your script, you run nilfs_cleanerd manually after mounting and
writing, so at this point, it seems that there are two nilfs_cleanerd
processes, and both of them are requesting GC on the same device.

If that happens, it will prevent fatal situations that would cause FS
destruction, but normal operation is not guaranteed regarding GC.  So,
could you please check the existing processes with the ps command?
If you start it via the mount command, it should not be started twice
for the same device.

If you want to run GC manually, use the "nilfs-clean" command to
activate nilfs_cleanerd as follows:

# nilfs-clean -p 0 $DEVICE

If you really want to run nilfs_cleanerd manually, specify "nogc"
mount option when mounting:

# mount -o nogc $DEVICE $MOUNT_POINT

In this case, you need to manually kill nilfs_cleanerd when unmounting.

Depending on your environment, you may need to specify the file system manually:

# mount -t nilfs2 -o nogc $DEVICE $MOUNT_POINT

Also, the version of nilfs-utils used is old, so in order to isolate
known bugs, it would be helpful if you could use the latest version of
nilfs-utils-2.2.11 (or nilfs-utils 2.3.0-dev) for testing.

You can download the latest version tarball from the site [1] or from
github as described in [2].

[1] https://nilfs.sourceforge.io/en/download.html
[2] https://nilfs.sourceforge.io/en/git_repos.html


Thank you.

Ryusuke Konishi





[Index of Archives]     [Linux Filesystem Development]     [Linux BTRFS]     [Linux CIFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux