Re: patch for "limited performance for disperse volumes"

Milind Changire <mchangir@xxxxxxxxxx> · Fri, 10 Feb 2017 23:34:56 +0530

Here's a quote from a paper titled: Non-blocking Writes to Files
https://www.usenix.org/conference/fast15/technical-sessions/presentation/campello

-----
Ordering of Page Updates.
Non-blocking writes may alter the sequence in which patches to
different pages get applied since the page fetches may complete
out-of-order. Non-blocking writes only replace writes that are
to memory that are not guaranteed to be reflected to persistent
storage in any particular sequence. Thus, ordering violations in
updates of in-memory pages are crash-safe.

Page Persistence and Syncs.
If an application would like explicit disk ordering for memory
page updates, it would execute a blocking flush operation
(e.g., fsync ) subsequent to each operation. The flush operation
causes the OS to force the fetch of any page indexed as NBW even
if it has not been allocated yet. The OS then obtains the page
lock, waits for the page fetch, and applies any outstanding
patches, before flushing the page and returning control to the
application. Ordering of disk writes are thus preserved with
non-blocking writes.
-----

Milind

On 02/10/2017 01:37 PM, Xavier Hernandez wrote:
Hi Raghavendra,

On 10/02/17 04:51, Raghavendra Gowdappa wrote:
+gluster-devel

----- Original Message -----
From: "Milind Changire" <mchangir@xxxxxxxxxx>
To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
Cc: "rhs-zteam" <rhs-zteam@xxxxxxxxxx>
Sent: Thursday, February 9, 2017 11:00:18 PM
Subject: patch for "limited performance for disperse volumes"

My first comment was:
looks like patch for "limited performance for disperse volume" [1] is
going
to be helpful for all other types of volumes as well; but how do we
guarantee ordering for writes over the same fd for the same offset and
length in the file ?

then thinking over a bit and in case you missed my comment over IRC:
I was thinking about network multi-pathing and rpc requests(two writes)
being routed through different interfaces to gluster nodes which might
lead to a non-increasing transaction ID sequence and hence might lead
to incorrect final value if the older write is committed to the same
offset+length

then it dawned on me that for blocking operations the write() call
wont return until the data is safe on the disk across the network or
the intermediate translators have cached it appropriately to be
written behind.

so would the patch work for two non-blocking writes originating for the
same fd from the same thread for the same offset+length and being
routed over multi-pathing and write #2 getting routed quicker than
write #1 ?

To be honest I've not considered the case of asynchronous writes from
application till now. What is the ordering guarantee the
OS/filesystems provide for two async writes? For eg., if there are two
writes w1 and w2, when is w2 issued?
* After cbk of w1 is called or
* parallely just after async_write (w1) returns (cbk of w1 is not
invoked yet)?

What do POSIX or other standards (or expectation from OS) say about
ordering in case 2 above?

I'm not an expert on POSIX. But I've found this [1]:

    2.9.7 Thread Interactions with Regular File Operations

    All of the following functions shall be atomic with respect to
    each other in the effects specified in POSIX.1-2008 when they
    operate on regular files or symbolic links: [...] write [...]

    If two threads each call one of these functions, each call shall
    either see all of the specified effects of the other call, or none
    of them. The requirement on the close() function shall also apply
    whenever a file descriptor is successfully closed, however caused
    (for example, as a consequence of calling close(), calling dup2(),
    or of process termination).

Not sure if this also applies to write requests issued asynchronously
from the same thread, but this would be the worst case (if the OS
already orders it, we won't have any problem).

As I see it, this is already satisfied by EC because it doesn't allow
two concurrent writes to happen at the same time. They can be reordered
if the second one arrives before the first one, but they are executed
atomically as POSIX requires. Not sure if AFR also satisfies this
condition, but I think so.

From the point of view of EC it's irrelevant if the write comes from the
same thread or from different processes on different clients. They are
handled in the same way.

However a thing to be aware of (from the man page of write):

    [...] among the effects that should be atomic across threads (and
    processes) are updates of the file offset. However, on Linux before
    version 3.14, this was not the case: if two processes that share an
    open file description (see open(2)) perform a write() (or
    writev(2)) at the same time, then the I/O operations were not atomic
    with respect updating the file offset, with the result that the
    blocks of data output by the two processes might (incorrectly)
    overlap. This problem was fixed in Linux 3.14.

Xavi

[1]
http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_07

[1] https://review.gluster.org/15036

just thinking aloud

--
Milind

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel