Re: 答复: Re: Gluster AFR volume write performance has been seriously affected by GLUSTERFS_WRITE_IS_APPEND in afr_writev

li.ping288@xxxxxxxxxx · Fri, 29 Jan 2016 16:55:12 +0800

LiPing10168633/user/zte_ltd 写于 2016/01/28 21:40:30:

> From: 李平10168633/user/zte_ltd

> To: Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>,
gluster-devel@xxxxxxxxxxx, 

> Cc: li.yi79@xxxxxxxxxx, Liu.Jianjun3@xxxxxxxxxx,

> yang.bin18@xxxxxxxxxx, zhou.shigang37@xxxxxxxxxx

> Date: 2016/01/28 21:40

> Subject: Re: 答复: Re: [Gluster-devel] Gluster
AFR volume write 

> performance has been seriously affected by GLUSTERFS_WRITE_IS_APPEND

> in afr_writev

> 

> Sorry for the late reply.

> 

> Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> 写于 2016/01/25
17:48:06:

> 

> > From: Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>

> > To: li.ping288@xxxxxxxxxx, 

> > Cc: li.yi79@xxxxxxxxxx, zhou.shigang37@xxxxxxxxxx,

> > Liu.Jianjun3@xxxxxxxxxx, yang.bin18@xxxxxxxxxx

> > Date: 2016/01/25 17:48

> > Subject: Re: 答复: Re: [Gluster-devel]
Gluster AFR volume write 

> > performance has been seriously affected by GLUSTERFS_WRITE_IS_APPEND

> > in afr_writev

> > 

> > 

> > On 01/25/2016 03:09 PM, li.ping288@xxxxxxxxxx
wrote:

> > Hi Pranith, 

> > 

> > I'd be willing to have a chance to do my contribution to open-source.

> > It's my first time to deliver a patch for GlusterFS, hence I'm
not 

> > quite familiar with the code review and submitting procedures.

> > 

> > I'll try to make it ASAP. By the way is there any guidelines
to dothis work?

> > http://www.gluster.org/community/documentation/index.php/

> > Simplified_dev_workflow may be helpful. Feel free to ask any
doubt 

> > you may have.

> > 

> > How do you guys use glusterfs?

> > 

> > Pranith

> Thanks for your warm tips.  We currently
use glusterfs to build the 

> shared storage for distributed cluster nodes.

> 

> Here are the solutions I pondered over these days:

> 

> 1，Reverting the AFR GLUSTERFS_WRITE_IS_APPEND modifications.  

> because this optimization only play a part for appending write fops,

>      but most of the time of writing
it is not kind of this. Hence I

> think it is not worth to do an optimization for the low probability

> situation  

>      at cost of the vast majority
of AFR writing performance drop. 

> 2，Revising the fixed GLUSTERFS_WRITE_IS_APPEND
dictionary option in

> afr_writev in a dynamic way.  i.e. adding a new dynamic configurable

>      option "write_is_append"
just as the existing "ensure-

> durability" for AFR.  It could be configured on if AFR writing

> performance is not mainly 

>      concerned and off if the
performance is demanded.

>      

> I have been trying to find out a way in posix_writev
to predict the 

> appending write  in advance and then lock/unlock or not lock

> accordingly in the 

> shortest and soonest, but I get no chance.

3,  Another compromising solution crossing my
mind today is to let the WRITE_IS_APPEND not 

take effect for O_DIRECT option. It is already  ineffective
for SYNC writing, and also the performance for 

page cache writing is not so bad (not as good as no
locking of course).

I would prefer the 2th and 3th way.

Are there any other opinions?

> 

> Anybody's other good ideas are appreciated.

> 

> Ping.Li

> 

> > 

> > Thanks & Best Regards. 

> > 

> > Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> 写于 2016/01/23
14:01:36:

> > 

> > > From: Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>

> > > To: li.ping288@xxxxxxxxxx, gluster-devel@xxxxxxxxxxx, 

> > > Cc: li.yi79@xxxxxxxxxx, Liu.Jianjun3@xxxxxxxxxx, 

> > > zhou.shigang37@xxxxxxxxxx, yang.bin18@xxxxxxxxxx 

> > > Date: 2016/01/23 14:02 

> > > Subject: Re: 答复: Re: [Gluster-devel] Gluster AFR volume
write 

> > > performance has been seriously affected by GLUSTERFS_WRITE_IS_APPEND

> > > in afr_writev 

> > > 

> > > 

> > 

> > > On 01/22/2016 07:14 AM, li.ping288@xxxxxxxxxx wrote: 

> > > Hi Pranith, it is appreciated for your reply. 

> > > 

> > > Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> 写于
2016/01/20 18:51:19:

> > > 

> > > > 发件人:  Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>

> > > > 收件人:  li.ping288@xxxxxxxxxx, gluster-devel@xxxxxxxxxxx,

> > > > 日期:  2016/01/20 18:51 

> > > > 主题: Re: [Gluster-devel] Gluster AFR volume write
performance has 

> > > > been seriously affected by GLUSTERFS_WRITE_IS_APPEND
in afr_writev 

> > > > 

> > > > Sorry for the delay in response.

> > > 

> > > > On 01/15/2016 02:34 PM, li.ping288@xxxxxxxxxx wrote:

> > > > GLUSTERFS_WRITE_IS_APPEND Setting in afr_writev function
at 

> > > > glusterfs client end makes the posix_writev in the
server end  deal 

> > > > IO write fops from parallel  to serial in consequence.

> > > > 

> > > > i.e.  multiple io-worker threads carrying out
IO write fops are 

> > > > blocked in posix_writev to execute final write fop
pwrite/pwritev in

> > > > __posix_writev function ONE AFTER ANOTHER. 

> > > > 

> > > > For example: 

> > > > 

> > > > thread1: iot_worker -> ...  -> posix_writev()
  | 

> > > > thread2: iot_worker -> ...  -> posix_writev()
  | 

> > > > thread3: iot_worker -> ...  -> posix_writev()
  -> __posix_writev() 

> > > > thread4: iot_worker -> ...  -> posix_writev()
  | 

> > > > 

> > > > there are 4 iot_worker thread doing the 128KB IO write
fops as 

> > > > above, but only one can execute __posix_writev function
and the 

> > > > others have to wait. 

> > > > 

> > > > however, if the afr volume is configured on with storage.linux-aio

> > > > which is off in default,  the iot_worker will
use posix_aio_writev 

> > > > instead of posix_writev to write data. 

> > > > the posix_aio_writev function won't be affected by

> > > > GLUSTERFS_WRITE_IS_APPEND, and the AFR volume write

> performance goes up. 

> > > > I think this is a bug :-(. 

> > > 

> > > Yeah, I agree with you. I suppose the GLUSTERFS_WRITE_IS_APPEND
is a

> > > misuse in afr_writev. 

> > > I checked the original intent of GLUSTERS_WRITE_IS_APPEND
change at 

> > > review website: 

> > > http://review.gluster.org/#/c/5501/

> > > 

> > > The initial purpose seems to avoid an unnecessary fsync()
in 

> > > afr_changelog_post_op_safe function if the writing data
position 

> > > was currently at the end of the file, detected by 

> > > (preop.ia_size == offset || (fd->flags & O_APPEND))
in posix_writev. 

> > > 

> > > In comparison with the afr write performance loss, I think

> > > it costs too much. 

> > > 

> > > I suggest to make the GLUSTERS_WRITE_IS_APPEND setting configurable

> > > just as ensure-durability in afr. 

> > > 

> > > You are right, it doesn't make sense to put this option
in 

> > > dictionary if ensure-durability is off. http://review.gluster.org/13285

> > > addresses this. Do you want to try this out?

> > > Thanks for doing most of the work :-). Do let me know if
you want to

> > > raise a bug for this. Or I can take that up if you don't
have time.

> > > 

> > > Pranith 

> > > 

> > > > 

> > > > So, my question is whether  AFR volume could work
fine with 

> > > > storage.linux-aio configuration which bypass the 

> > > > GLUSTERFS_WRITE_IS_APPEND setting in afr_writev, 

> > > > and why glusterfs keeps posix_aio_writev different
from posix_writev ? 

> > > > 

> > > > Any replies to clear my confusion would be grateful,
and thanks 

> > in advance.

> > > > What is the workload you have? multiple writers on
same file workloads? 

> > > 

> > > I test the afr gluster volume by fio like this: 

> > > fio --filename=/mnt/afr/20G.dat --direct=1 --rw=write --bs=128k
--

> > > size=20G --numjobs=8   

> > > --runtime=60 --group_reporting --name=afr_test  --iodepth=1
--

> > ioengine=libaio

> > > 

> > > The Glusterfs BRICKS are two IBM X3550 M3. 

> > > 

> > > The local disk direct write performance of 128KB IO req
block size 

> > > is about 18MB/s 

> > > in single thread and 80MB/s in 8 multi-threads. 

> > > 

> > > If the GLUSTERS_WRITE_IS_APPEND is configed, the afr gluster
volume 

> > > write performance is 18MB/s 

> > > as the single thread, and if not, the performance is nearby
75MB/s.

> > > (network bandwith is enough) 

> > > 

> > > > 

> > > > Pranith 

> > > > 

> > > > 

> > > > --------------------------------------------------------

> > > > ZTE Information Security Notice: The information contained
in this 

> > > > mail (and any attachment transmitted herewith) is privileged
and 

> > > > confidential and is intended for the exclusive use
of the addressee

> > > > (s).  If you are not an intended recipient, any
disclosure, 

> > > > reproduction, distribution or other dissemination or
use of the 

> > > > information contained is strictly prohibited.  If
you have received 

> > > > this mail in error, please delete it and notify us
immediately.

> > > > 

> > > 

> > > > 

> > > > 

> > > 

> > > > _______________________________________________

> > > > Gluster-devel mailing list

> > > > Gluster-devel@xxxxxxxxxxx

> > > > http://www.gluster.org/mailman/listinfo/gluster-devel

> > > 

> > > --------------------------------------------------------

> > > ZTE Information Security Notice: The information contained
in this 

> > > mail (and any attachment transmitted herewith) is privileged
and 

> > > confidential and is intended for the exclusive use of the
addressee

> > > (s).  If you are not an intended recipient, any disclosure,

> > > reproduction, distribution or other dissemination or use
of the 

> > > information contained is strictly prohibited.  If you
have received 

> > > this mail in error, please delete it and notify us immediately.

> > > 
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel