Re: libgfapi zero copy write - application in samba, nfs-ganesha

Raghavendra G <raghavendra@xxxxxxxxxxx> · Fri, 30 Sep 2016 09:39:38 +0530

On Thu, Sep 29, 2016 at 11:11 AM, Raghavendra G <raghavendra@xxxxxxxxxxx> wrote:

On Wed, Sep 28, 2016 at 7:37 PM, Shyam <srangana@xxxxxxxxxx> wrote:
On 09/27/2016 04:02 AM, Poornima Gurusiddaiah wrote:

W.r.t Samba consuming this, it requires a great deal of code change in Samba.

Currently samba has no concept of getting buf from the underlying file system,

the filesystem comes into picture only at the last layer(gluster plugin),

where system calls are replaced by libgfapi calls. Hence, this is not readily

consumable by Samba, and i think same will be the case with NFS_Ganesha, will

let the Ganesha folksc comment on the same.

This is exactly my reservation about the nature of change [2] that is done in this patch. We expect all consumers to use *our* buffer management system, which may not be possible all the time.

>From the majority of consumers that I know of, other than what Sachin stated as an advantage for CommVault, none of the others can use the gluster buffers at the moment (Ganesha, SAMBA, qemu. (I would like to understand how CommVault can use gluster buffers in this situation without copying out data to the same, just for clarity).

+Jeff cody, for comments on QEMU

This is the reason I posted the comments at [1], stating we should copy out the buffer, when Gluster needs it preserved, but use application provided buffers as long as we can.

My concerns here are:

* We are just moving the copy from gfapi layer to write-behind. Though I am not sure what percentage of writes that hit write-behind are "written-back", I would assume it to be a significant percentage (otherwise there is no benefit in having write-behind). However, we can try this approach and get some perf data before we make a decision.

* Buffer management. All gluster code uses iobuf/iobrefs to manage the buffers of relatively large size. With the approach suggested above, I see two concerns:
    a. write-behind has to differentiate between iobufs that need copying (write calls through gfapi layer) and iobufs that can just be refed (writes from fuse etc) when "writing-back" the write. This adds more complexity.
    b. For the case where write-behind chooses to not "write-back" the write, we need a way of encapsulating the application buffer into iobuf/iobref. This might need changes in iobuf infra. 

I do see the advantages of zero-copy, but not when gluster api is managing the buffers, it just makes it more tedious for applications to use this scheme, IMHO.

Another point we can consider here is gfapi (and gluster internal xlator stack) providing both behaviors as mentioned below:
1. Making Glusterfs xlator stack use application buffers.
2. Forcing applications to use only gluster managed buffers if they want zero copy.

Let the applications make choice on what interface to use, based on their use-cases (as there is a trade-off in terms of performance, code changes, legacy applications which are resistant to change etc).

Could we think and negate (if possible) thoughts around using the application passed buffers as is? One caveat here seems to be when using RDMA (we need the memory registered if I am not wrong), as that would involve a copy to RDMA buffers when using application passed buffers.

Actually RDMA is not a problem in the current implementation (ruling out suggestions by others to use a pre-registered iobufs  for managing io-cache etc). This is because, in current implementation the responsibility of registering the memory region lies in transport/rdma. In other words transport/rdma doesn't expect pre-registered buffers.

 What are the other pitfalls?

[1] http://www.gluster.org/pipermail/gluster-devel/2016-August/050622.html

[2] http://review.gluster.org/#/c/14784/

Regards,

Poornima

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-devel

-- 
Raghavendra G

-- 
Raghavendra G

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel