Re: relative ordering of writes to same file from two different fds

Ric Wheeler <rwheeler@xxxxxxxxxx> · Thu, 22 Sep 2016 07:07:53 +0300

On 09/21/2016 08:58 PM, Jeff Darcy wrote:
However, my understanding is that filesystems need not maintain the relative
order of writes (as it received from vfs/kernel) on two different fds. Also,
if we have to maintain the order it might come with increased latency. The
increased latency can be because of having "newer" writes to wait on "older"
ones. This wait can fill up write-behind buffer and can eventually result in
a full write-behind cache and hence not able to "write-back" newer writes.
IEEE 1003.1, 2013 edition
http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html

After a write() to a regular file has successfully returned:

Any successful read() from each byte position in the file that was
modified by that write shall return the data specified by the write()
for that position until >such byte positions are again modified.

Any subsequent successful write() to the same byte position in the
file shall overwrite that file data.
Note that the reference is to a *file*, not to a file *descriptor*.
It's an application of the general POSIX assumption that time is
simple, locking is cheap (if it's even necessary), and therefore
time-based requirements like linearizability - what this is - are
easy to satisfy.  I know that's not very realistic nowadays, but
it's pretty clear: according to the standard as it's still written,
P2's write *is* required to overwrite P1's.  Same vs. different fd
or process/thread doesn't even come into play.

Just for fun, I'll point out that the standard snippet above
doesn't say anything about *non overlapping* writes.  Does POSIX
allow the following?

    write A
    write B
    read B, get new value
    read A, get *old* value

This is a non-linearizable result, which would surely violate
some people's (notably POSIX authors') expectations, but good
luck finding anything in that standard which actually precludes
it.

I will reply to both comments here.

First, I think that all file systems will perform this way since this is really 
a function of how the page cache works and O_DIRECT.

More broadly, this is not a promise or hard and fast thing - the traditional way 
applications that do concurrent writes is to make sure that they use either 
whole file or byte range locking when one or more threads/processes are doing IO 
to the same file concurrently.

I don't understand the Jeff snippet above - if they are non-overlapping writes 
to dfferent offsets, this would never happen.

If the writes are to the same offset and happened at different times, it would 
not happen either.

If they are the same offset and at the same time, then you can have an undefined 
results where you might get fragments of A and fragments of B (where you might 
be able to see some odd things if the write spans pages/blocks).

This last case is where the normal best practice comes in to suggest using locking.

Ric

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel