Re: gfapi zero copy write enhancement

Shyam <srangana@xxxxxxxxxx> · Thu, 25 Aug 2016 07:06:33 -0400

On 08/25/2016 02:46 AM, Saravanakumar Arumugam wrote:
Hi,

On 08/25/2016 12:58 AM, Shyam wrote:
Hi,

I was attempting to review this [1] change, and for a long time I
wanted to understand why we need this and what is the manner in which
we need to achieve the same. As there is a lack of understanding on my
part I am starting with some questions.

1) In writev FOP what is the role/use of the iobref parameter?
- I do not see the posix xlator using this
- The payload is carried in vector, rather than in the iobref
- Across the code, other than protocol/client that (sorf of)
serializes this, I do not see it having any use

So what am I missing?

2) Coming to the current change, what prevents us from doing this as [2]?
- in short just pass in the buffer received as a part of iovec

[2] is not necessarily clean, just a hack, that assumes there is a
iovcount of 1 always, and I just tested this with a write call, and
not a writev call. (just stating before we start a code review of the
same ;) )

3) From discussions in [3] and [4] I understand that this is to
eliminate copy when working with RDMA. Further, Avati's response to
the thought, discusses why we need to leave the memory management of
read/write buffers to the applications than use/reuse gluster buffers.

So, in the long term if this is for RDMA, what is the change in
justification for the manner in which we are asking applications to
use gluster buffers, than doing it the other way?

4) Why should applications not reuse buffers? and instead ask for
fresh/new buffers for each write call?
Reason is: The buffer might be ref'ed in some translator like io-cache
and write-behind.

Discussion in patch:
------------------------------------------------------------------------------

<< IMPORTANT: Buffer should not be reused across the zero copy write
operation. Is this still valid, given that application allocates and
free the buffer ? =============================
Yes this is still valid, if application tries to reuse the buffer then
it might see a hang.
The reason being, the buffer might be ref'ed in some translator like
io-cache and write-behind.
------------------------------------------------------------------------------

Thank you, I followed this in the code and now understand that we could 
be stashing away the iov pointers for later use in write behind and 
hence the copy in the gfapi layer.

5) What is the performance gain noticed with this approach? As the
thread that (re)started this is [5]. In Commvault Simpana,
- What were the perf gains due to this approach?
- How does the application use the write buffers?
  - Meaning, the buffer that is received from Gluster is used to
populate data from the network? I am curious as to how this
application uses these buffers, and where does data get copied into
these buffers from.

(slightly offtopic to this question)
Sometimes, Performance gained may not be in terms of read/write
rates..but in terms of free CPU.
Just to give an example: With copy, CPU occupancy is 70%
                                         Without copy CPU occupancy is 40%

Agreed, and valid. I wanted to know what the gain was and if Sachin can 
add more color to this, the better.

May I understand how this is being tested? and if there is a program to 
do so, could you pass it along.

But, Sachin can share the results.

Eliminating a copy in glfs_write seems trivial (if my hack and answers
to the first question are as per my assumptions), I am wondering what
we are attempting here, or what I am missing.
From what I am understood,  there is a layer of separation between
libgfapi and gluster.

Gluster plays with the buffer with whatever way it likes(read different
translators) and hence  allocation and freeing should happen from Gluster.
Otherwise, if application needs to have control over buffer, there is a
copy involved (at gluster layer).

Ok, let's target this part, which is where an alternative to the current 
approach exists.

So, why not use the application buffer till the (first) point where we 
decide to actually store the buffers (or buffer pointers as is the 
current mechanism) for later use. IOW, if we decide to fake a write, as 
in write behind, let's take a copy of the buffers then, rather than 
force applications to use gluster buffers. Wouldn't that be better.

Looking at FUSE and gNFS, in both cases we either read from the FUSE 
reader end, or RPCs from the network, so we already have to allocate and 
supply a buffer for reading the requests (in this case the write 
request) from these ends. This means we have buffers that we can stash 
away, by taking a ref in write behind or other places.

In the case of gfapi, as the application passes in a buffer, to adhere 
to the current mechanism we are creating a copy. Why not consider the 
buffers as non-gluster owned and take ownership (i.e copy) when needed 
and hence address the current problem?

Thanks,
Saravana

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel