Re: [PATCH RFC] nilfs2: continuous snapshotting file system

Jörn Engel <joern@xxxxxxxxx> · Fri, 29 Aug 2008 12:45:00 +0200

On Fri, 29 August 2008 15:29:35 +0900, Ryusuke Konishi wrote:
> On Wed, 27 Aug 2008 20:19:04 +0200,  Jorn Engel wrote:
> 
> >Do you do wear leveling or scrubbing?
> 
> NILFS does not support scrubbing. (as you guessed)
> Under the current GC daemon, it writes logs sequentially and circularly
> in the partition, and as you know, this leads to the wear levelling
> except for superblock.

I am a bit confused here.  My picture of log-structured filesystems was
always that writes go round-robin _within_ a segment, but new segments
can be picked in any order.  So there is a good chance of some segments
simply never being picked and others constantly being reused.

If nilfs works in the same way, it will by design spread the writes
somewhat better than ext3, to pick an example, but can still lead to
local wear-out if f.e. 98% of the filesystem is full and the remaining
2% receive a high write load.

True wear leveling requires a bit more work.  Either some probabilistic
garbage collection of any random segment, as jffs2 does, or storing some
write counters and keeping them roughly level as logfs does.

> >How does garbage collection work?  In particular, when the filesystem
> >runs out of free space, do you depend on the userspace daemon to make
> >some policy decisions or can the kernel make progress on its own?
> 
> The GC of NILFS depends on the userspace daemon to make policy decisions.
> NILFS cannot reclaim disk space on its own though it can work 
> (i.e. read, write, or do other operations) without the daemon.
> After it runs out of free space, disk full errors will be returned
> until GC makes new space.

This looks problematic.  In logfs I was very careful to define a
"filesystem full" condition that is independent of GC.  So with a single
writer, -ENOSPC always means the filesystem is full and the only way to
gain some free space is by deleting data again.

In nilfs it appears possible that a single writer received -ENOSPC and
can simply continue writing until - magically - there is space again
because the GC daemon woke up and freed some more.  That is unexpected,
to say the least.

Which is also one of the reasons why I don't like the userspace daemon
approach very much.  Decent behaviour now requires that you block the
writes, wake up the userspace daemon and wait for it to do its job.  Or
you would have to implement a backup-daemon in kernelspace which gets
called into whenever -ENOSPC would be returned otherwise.

> But, usually the GC will make enough disk space in the background
> before that occurs.

Usually, yes.  You just have to make sure that in the unusual cases the
filesystem continues to behave correctly. ;)

Jörn

-- 
Homo Sapiens is a goal, not a description.
-- unknown
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html