Chris Friesen, on 11/15/2012 05:35 PM wrote:
The easiest way to implement this fsync would involve three things:
1. Schedule writes for all dirty pages in the fs cache that belong to
the affected file, wait for the device to report success, issue a cache
flush to the device (or request ordering commands, if available) to make
it tell the truth, and wait for the device to report success. AFAIK this
already happens, but without taking advantage of any request ordering
commands.
2. The requesting thread returns as soon as the kernel has identified
all data that will be written back. This is new, but pretty similar to
what AIO already does.
3. No write is allowed to enqueue any requests at the device that
involve the same file, until all outstanding fsync complete [3]. This is
new.
This sounds interesting as a way to expose some useful semantics to userspace.
I assume we'd need to come up with a new syscall or something since it doesn't
match the behaviour of posix fsync().
This is how I would export cache sync and requests ordering abstractions to the
user space:
For async IO (io_submit() and friends) I would extend struct iocb by flags, which
would allow to set the required capabilities, i.e. if this request is FUA, or full
cache sync, immediate [1] or not, ORDERED or not, or all at the same time, per
each iocb.
For the regular read()/write() I would add to "flags" parameter of
sync_file_range() one more flag: if this sync is immediate or not.
To enforce ordering rules I would add one more command to fcntl(). It would make
the latest submitted write in this fd ORDERED.
All together those should provide the requested functionality in a simple,
effective, unambiguous and backward compatible manner.
Vlad
1. See my other today's e-mail about what is immediate cache sync.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html