On 01/23/2011 06:17 AM, Ted Ts'o wrote:
that's why a fakefsync mount option would be nice to have.
Yes, except the file system developers don't want to take on the moral
liability of system administrators using such a mount option
incorrectly.
I understand
The fsync waits for all data to be sent to disk. It has to; since we
can't easily, given the current disk protocols, distinguish between
the 5 MB of I/O that pertains to file A which is being fsync'ed, but
not the 20 MB of I/O pertaining to file B which is going on in the
background.
So it's a queue drain + cache flush, right?
There is a way, for some newer disk drives, to do what's
called a FUA (Force Unit Attention) ...
I thought it was possible via the completion notifications from the disk.
AFAIK if a disk is in NCQ mode it will return completion for a command
only when the write was really delivered to the platters. While in
non-NCQ mode the disk immediately returns completion and caches the
write. Is this correct?
Oh ok but that's not the problem, I understand now, the problem is that
you want to see all 5MB of data delivered to the platters, not only 1
write command...
So the only way is a queue drain.
So if we want to see faster fsyncs we have to reduce the nr_requests of
a disk, so that the request_queue is short, right?
There were ideas around for an API for dependencies among BIOs.
e.g. here:
https://lwn.net/Articles/399148/
This would solve the problem of needing a queue drain for an fsync,
right? Ext4 could make the last BIO of the file being synced to depend
on all the other BIOs related to the same file, and then wait the NCQ
completion notification for the last BIO. There wouldn't be a need to to
drain the queue any more.
At that point it could even make sense to make all fsyncs-related I/O to
jump at the head of the request_queue, so that fsyncs (hopefully related
to small amounts of data) could return quickly even when there is a
large file streaming or copy in the background filling the whole
request_queue...
Does what I'm saying make sense?
I understand this feature would require major changes in Linux though...
Thank you for all these explanations,
these things really help us ignorant ext4 users understand...
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html