Hi, thank you very much for your reply. On 5/22/06, Stephen C. Tweedie <sct@xxxxxxxxxx> wrote:
Hi, On Fri, 2006-05-19 at 21:27 +0200, Martin Jambor wrote: > The problem is that we require sync_fs() super operation to be the > last executed part of the sys_sync() syscall implementation. In other > words, we need all inodes to be written before sync_fs() is called. You can't really ever guarantee that.
Well, this was mainly issue during sync when this was really necessary and calling s_op->sync_fs() from s_op->put_sb() did what we wanted. (On the other hand the number of calls to sync_fs() called during an umount is somewhat big.) OTOH, now that we have a dirty inode list, I'll try to remove it because it should not be necessary. I understand that inodes can be marked dirty indefinitely while syncing and that I cannot wait unitil they are all written out. The problem really was to write out the inodes that should be written out at this moment before finalizing the sync and mainly to avoid the issue described below.
> What is more, I regularly encounter a situation when a part of a > directory operation (usually the directory) is written before > sync_fs() is called and the second part (a new inode, for example) > afterwards. This means that the filesystem is only half-synced and > also in an inconsistent state. Right. There's nothing at all to synchronise sync() with ongoing filesystem operations. If you're renaming a file from one directory to another, there's nothing in the VFS to keep both of those directory updates in a single sync.
OK, we have decided to enforce this explicitly. We have also taken into account that directory operations can even modify pages that are being BIOes to the disk as a part of the sync. If any of those changes reached the disk, that would result into an inconsistend directory tree as well. Therefore we decided to mutually exclude directory operations and sync using an rw semaphore (with some extra care not to starve the sync). Does that sound reasonable?
Simply syncing the superblock last won't cure this, either.
No, we have never thought about that nor we want to keep the consistency at all times. We only need to have a consistent directory tree at the time we return from sync_fs() so that a roll forward utility can pick up from that point should the system crash.
If you want to guarantee consistency on disk at all times, you really need to provide that atomicity within your filesystem itself.
Good point. Hope the rwsem will do it. Thanks once more, Martin - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html