Re: nilfs2 doesn't garbage collect checkpoints for me

dexen deVries <dexen.devries@xxxxxxxxx> · Thu, 26 May 2011 20:32:53 +0200

Hi,

On Thursday 26 of May 2011 20:11:55 you wrote:
> I'm testing nilfs2 and other fs's for use on cheap flash cards, trying
> to avoid writing same location all the time.

I'm using nilfs2 on a small server with a cheap-o 16GB SSD extracted from 
Eeepc for the same reason; works great.

> My test program makes lots of small sqlite transactions which sqlite
> syncs to disk.
> In less than 2000 transaction 1GB nilfs2 volume ran out of disk space.
> tried unmount, mount again, didn't help
> block device is nbd, works with with other fs's
> 
> lscp shows there are 7121 checkpoints and somehow old ones are not
> removed automatically.

First off, the default configuration of nilfs_cleanerd is to keep all 
checkpoints for at least one hour (3600 seconds). See file 
/etc/nilfs_cleanerd.conf, option `protection_period'. For testing you may want 
to change the protection period to just a few seconds and see if that helps. 
Either via the config file (and issue a SIGHUP so it reloads the config) or via 
the `-p SECONDS' argument (see manpage).

To see what's going on, you may want to change (temporarily) the 
`log_priority' in config file to `debug'; in /var/log/debug you should then see 
statements describing actions of the nilfs_cleanerd.

Example:

May 26 20:23:53 blitz nilfs_cleanerd[3198]: wake up
May 26 20:23:53 blitz nilfs_cleanerd[3198]: ncleansegs = 1175
May 26 20:23:53 blitz nilfs_cleanerd[3198]: 4 segments selected to be cleaned
May 26 20:23:53 blitz nilfs_cleanerd[3198]: protected checkpoints = 
[156725,157003] (protection period >= 1306430633)
May 26 20:23:53 blitz nilfs_cleanerd[3198]: segment 1844 cleaned
May 26 20:23:53 blitz nilfs_cleanerd[3198]: segment 1845 cleaned
May 26 20:23:53 blitz nilfs_cleanerd[3198]: segment 1846 cleaned
May 26 20:23:53 blitz nilfs_cleanerd[3198]: segment 1847 cleaned
May 26 20:23:53 blitz nilfs_cleanerd[3198]: wait 0.488223000

where the `ncleansegs' is the number of clean (free) segments you already 
have, and `protected checkpoints' indicates range of checkpoint numbers that 
are still under protection (due to the `protection_period' setting)

In any case, my understanding is that in typical DB, each transaction (which 
may be each command, if you don't begin/commit transaction explicitly) causes 
an fsync() which creates a new checkpoint. On a small drive that *may* cause 
creation of so many checkpoints in a short time they don't get GC'd before the 
drive fills up. Not sure yet how to work around that.

Two more possible sources of the problem:
1) GC used to break in certain scenario: the FS could become internally 
inconsistent (no data loss, but it wouldn't perform GC anymore) if two or more 
nilfs_cleanerds were processing it at the same time. It's probably fixed with 
the most recent patches. To check if that's the case, see output of `dmesg' 
command; it would indicate problems in NILFS.

2) new `nilfs_cleanerd' process may become stuck on semaphore if you kill the 
old one hard  (for example, kill -9). That used to leave aux file in /dev/shm/, 
like /dev/shm/sem.nilfs-cleaner-2067. To check if that's the case, run 
nilfs_cleanred through strace, like:

# strace -f nilfs_cleanerd /dev/YOUR_FILESYSTEM

if it hangs at one point on futex() call, that's it. A brute-force, but sure-
fire way is to kill all instances of nilfs_cleanerd and remove files matching 
/dev/shm/sem.nilfs-cleaner-*

Hope that helps somehow~

-- 
dexen deVries

``One can't proceed from the informal to the formal by formal means.''
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html