LiPing10168633/user/zte_ltd 写于 2016/01/28 21:40:30:
> From: 李平10168633/user/zte_ltd
> To: Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>, gluster-devel@xxxxxxxxxxx,
> Cc: li.yi79@xxxxxxxxxx, Liu.Jianjun3@xxxxxxxxxx,
> yang.bin18@xxxxxxxxxx, zhou.shigang37@xxxxxxxxxx
> Date: 2016/01/28 21:40
> Subject: Re: 答复: Re: [Gluster-devel] Gluster AFR volume write
> performance has been seriously affected by GLUSTERFS_WRITE_IS_APPEND
> in afr_writev
>
> Sorry for the late reply.
>
> Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> 写于 2016/01/25 17:48:06:
>
> > From: Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>
> > To: li.ping288@xxxxxxxxxx,
> > Cc: li.yi79@xxxxxxxxxx, zhou.shigang37@xxxxxxxxxx,
> > Liu.Jianjun3@xxxxxxxxxx, yang.bin18@xxxxxxxxxx
> > Date: 2016/01/25 17:48
> > Subject: Re: 答复: Re: [Gluster-devel] Gluster AFR volume write
> > performance has been seriously affected by GLUSTERFS_WRITE_IS_APPEND
> > in afr_writev
> >
> >
> > On 01/25/2016 03:09 PM, li.ping288@xxxxxxxxxx wrote:
> > Hi Pranith,
> >
> > I'd be willing to have a chance to do my contribution to open-source.
> > It's my first time to deliver a patch for GlusterFS, hence I'm not
> > quite familiar with the code review and submitting procedures.
> >
> > I'll try to make it ASAP. By the way is there any guidelines to dothis work?
> > http://www.gluster.org/community/documentation/index.php/
> > Simplified_dev_workflow may be helpful. Feel free to ask any doubt
> > you may have.
> >
> > How do you guys use glusterfs?
> >
> > Pranith
> Thanks for your warm tips. We currently use glusterfs to build the
> shared storage for distributed cluster nodes.
>
> Here are the solutions I pondered over these days:
>
> 1,Reverting the AFR GLUSTERFS_WRITE_IS_APPEND modifications.
> because this optimization only play a part for appending write fops,
> but most of the time of writing it is not kind of this. Hence I
> think it is not worth to do an optimization for the low probability
> situation
> at cost of the vast majority of AFR writing performance drop.
> 2,Revising the fixed GLUSTERFS_WRITE_IS_APPEND dictionary option in
> afr_writev in a dynamic way. i.e. adding a new dynamic configurable
> option "write_is_append" just as the existing "ensure-
> durability" for AFR. It could be configured on if AFR writing
> performance is not mainly
> concerned and off if the performance is demanded.
>
> I have been trying to find out a way in posix_writev to predict the
> appending write in advance and then lock/unlock or not lock
> accordingly in the
> shortest and soonest, but I get no chance.
3, Another compromising solution crossing my mind today is to let the WRITE_IS_APPEND not
take effect for O_DIRECT option. It is already ineffective for SYNC writing, and also the performance for
page cache writing is not so bad (not as good as no locking of course).
I would prefer the 2th and 3th way.
Are there any other opinions?
>
> Anybody's other good ideas are appreciated.
>
> Ping.Li
>
> >
> > Thanks & Best Regards.
> >
> > Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> 写于 2016/01/23 14:01:36:
> >
> > > From: Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>
> > > To: li.ping288@xxxxxxxxxx, gluster-devel@xxxxxxxxxxx,
> > > Cc: li.yi79@xxxxxxxxxx, Liu.Jianjun3@xxxxxxxxxx,
> > > zhou.shigang37@xxxxxxxxxx, yang.bin18@xxxxxxxxxx
> > > Date: 2016/01/23 14:02
> > > Subject: Re: 答复: Re: [Gluster-devel] Gluster AFR volume write
> > > performance has been seriously affected by GLUSTERFS_WRITE_IS_APPEND
> > > in afr_writev
> > >
> > >
> >
> > > On 01/22/2016 07:14 AM, li.ping288@xxxxxxxxxx wrote:
> > > Hi Pranith, it is appreciated for your reply.
> > >
> > > Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> 写于 2016/01/20 18:51:19:
> > >
> > > > 发件人: Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>
> > > > 收件人: li.ping288@xxxxxxxxxx, gluster-devel@xxxxxxxxxxx,
> > > > 日期: 2016/01/20 18:51
> > > > 主题: Re: [Gluster-devel] Gluster AFR volume write performance has
> > > > been seriously affected by GLUSTERFS_WRITE_IS_APPEND in afr_writev
> > > >
> > > > Sorry for the delay in response.
> > >
> > > > On 01/15/2016 02:34 PM, li.ping288@xxxxxxxxxx wrote:
> > > > GLUSTERFS_WRITE_IS_APPEND Setting in afr_writev function at
> > > > glusterfs client end makes the posix_writev in the server end deal
> > > > IO write fops from parallel to serial in consequence.
> > > >
> > > > i.e. multiple io-worker threads carrying out IO write fops are
> > > > blocked in posix_writev to execute final write fop pwrite/pwritev in
> > > > __posix_writev function ONE AFTER ANOTHER.
> > > >
> > > > For example:
> > > >
> > > > thread1: iot_worker -> ... -> posix_writev() |
> > > > thread2: iot_worker -> ... -> posix_writev() |
> > > > thread3: iot_worker -> ... -> posix_writev() -> __posix_writev()
> > > > thread4: iot_worker -> ... -> posix_writev() |
> > > >
> > > > there are 4 iot_worker thread doing the 128KB IO write fops as
> > > > above, but only one can execute __posix_writev function and the
> > > > others have to wait.
> > > >
> > > > however, if the afr volume is configured on with storage.linux-aio
> > > > which is off in default, the iot_worker will use posix_aio_writev
> > > > instead of posix_writev to write data.
> > > > the posix_aio_writev function won't be affected by
> > > > GLUSTERFS_WRITE_IS_APPEND, and the AFR volume write
> performance goes up.
> > > > I think this is a bug :-(.
> > >
> > > Yeah, I agree with you. I suppose the GLUSTERFS_WRITE_IS_APPEND is a
> > > misuse in afr_writev.
> > > I checked the original intent of GLUSTERS_WRITE_IS_APPEND change at
> > > review website:
> > > http://review.gluster.org/#/c/5501/
> > >
> > > The initial purpose seems to avoid an unnecessary fsync() in
> > > afr_changelog_post_op_safe function if the writing data position
> > > was currently at the end of the file, detected by
> > > (preop.ia_size == offset || (fd->flags & O_APPEND)) in posix_writev.
> > >
> > > In comparison with the afr write performance loss, I think
> > > it costs too much.
> > >
> > > I suggest to make the GLUSTERS_WRITE_IS_APPEND setting configurable
> > > just as ensure-durability in afr.
> > >
> > > You are right, it doesn't make sense to put this option in
> > > dictionary if ensure-durability is off. http://review.gluster.org/13285
> > > addresses this. Do you want to try this out?
> > > Thanks for doing most of the work :-). Do let me know if you want to
> > > raise a bug for this. Or I can take that up if you don't have time.
> > >
> > > Pranith
> > >
> > > >
> > > > So, my question is whether AFR volume could work fine with
> > > > storage.linux-aio configuration which bypass the
> > > > GLUSTERFS_WRITE_IS_APPEND setting in afr_writev,
> > > > and why glusterfs keeps posix_aio_writev different from posix_writev ?
> > > >
> > > > Any replies to clear my confusion would be grateful, and thanks
> > in advance.
> > > > What is the workload you have? multiple writers on same file workloads?
> > >
> > > I test the afr gluster volume by fio like this:
> > > fio --filename=/mnt/afr/20G.dat --direct=1 --rw=write --bs=128k --
> > > size=20G --numjobs=8
> > > --runtime=60 --group_reporting --name=afr_test --iodepth=1 --
> > ioengine=libaio
> > >
> > > The Glusterfs BRICKS are two IBM X3550 M3.
> > >
> > > The local disk direct write performance of 128KB IO req block size
> > > is about 18MB/s
> > > in single thread and 80MB/s in 8 multi-threads.
> > >
> > > If the GLUSTERS_WRITE_IS_APPEND is configed, the afr gluster volume
> > > write performance is 18MB/s
> > > as the single thread, and if not, the performance is nearby 75MB/s.
> > > (network bandwith is enough)
> > >
> > > >
> > > > Pranith
> > > >
> > > >
> > > > --------------------------------------------------------
> > > > ZTE Information Security Notice: The information contained in this
> > > > mail (and any attachment transmitted herewith) is privileged and
> > > > confidential and is intended for the exclusive use of the addressee
> > > > (s). If you are not an intended recipient, any disclosure,
> > > > reproduction, distribution or other dissemination or use of the
> > > > information contained is strictly prohibited. If you have received
> > > > this mail in error, please delete it and notify us immediately.
> > > >
> > >
> > > >
> > > >
> > >
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel@xxxxxxxxxxx
> > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > >
> > > --------------------------------------------------------
> > > ZTE Information Security Notice: The information contained in this
> > > mail (and any attachment transmitted herewith) is privileged and
> > > confidential and is intended for the exclusive use of the addressee
> > > (s). If you are not an intended recipient, any disclosure,
> > > reproduction, distribution or other dissemination or use of the
> > > information contained is strictly prohibited. If you have received
> > > this mail in error, please delete it and notify us immediately.
> > >
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel