Re: Sync issue

"Martin Jambor" <jambormartin@xxxxxxxxx> · Wed, 24 May 2006 03:36:43 +0200

Hi,

thank you very much for your reply.

On 5/22/06, Stephen C. Tweedie <sct@xxxxxxxxxx> wrote:
Hi,

On Fri, 2006-05-19 at 21:27 +0200, Martin Jambor wrote:

> The problem  is that  we require sync_fs()  super operation to  be the
> last executed part of the sys_sync() syscall implementation.  In other
> words, we need all inodes to be written before sync_fs() is called.

You can't really ever guarantee that.

Well, this was mainly issue during sync when this was really necessary
and   calling  s_op->sync_fs()   from  s_op->put_sb()   did   what  we
wanted. (On  the other  hand the number  of calls to  sync_fs() called
during an umount is somewhat big.)

OTOH,  now that we  have a  dirty inode  list, I'll  try to  remove it
because it should not be necessary.

I  understand  that inodes  can  be  marked  dirty indefinitely  while
syncing and  that I cannot wait  unitil they are all  written out. The
problem really was to write out  the inodes that should be written out
at  this moment before  finalizing the  sync and  mainly to  avoid the
issue described below.

> What  is more,  I regularly  encounter a  situation when  a part  of a
> directory  operation   (usually  the  directory)   is  written  before
> sync_fs() is  called and  the second part  (a new inode,  for example)
> afterwards.  This  means that the  filesystem is only  half-synced and
> also in an inconsistent state.

Right.  There's nothing at all to synchronise sync() with ongoing
filesystem operations.  If you're renaming a file from one directory to
another, there's nothing in the VFS to keep both of those directory
updates in a single sync.

OK, we  have decided  to enforce this  explicitly. We have  also taken
into account that directory operations  can even modify pages that are
being BIOes to the disk as a part of the sync. If any of those changes
reached  the disk, that  would result  into an  inconsistend directory
tree  as well.  Therefore  we decided  to  mutually exclude  directory
operations and sync using an rw semaphore (with some extra care not to
starve the sync). Does that sound reasonable?

Simply syncing the superblock last won't cure this, either.

No,  we  have  never thought  about  that  nor  we  want to  keep  the
consistency at all times. We  only need to have a consistent directory
tree  at the  time we  return from  sync_fs() so  that a  roll forward
utility can pick up from that point should the system crash.

If you want to guarantee consistency on disk at all times, you really
need to provide that atomicity within your filesystem itself.

Good point. Hope the rwsem will do it.

Thanks once more,

Martin
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html