Hi Pranith,
it is appreciated for your reply.
Pranith Kumar Karampuri
<pkarampu@xxxxxxxxxx>
写于 2016/01/20 18:51:19:
> 发件人: Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx>
> 收件人: li.ping288@xxxxxxxxxx,
gluster-devel@xxxxxxxxxxx,
> 日期: 2016/01/20 18:51
> 主题: Re: Gluster AFR volume
write performance has
> been seriously affected by GLUSTERFS_WRITE_IS_APPEND in
afr_writev
>
> Sorry for the delay in response.
> On 01/15/2016 02:34 PM,
li.ping288@xxxxxxxxxx
wrote:
> GLUSTERFS_WRITE_IS_APPEND Setting in
afr_writev
function at
> glusterfs client end makes the posix_writev in the server
end deal
> IO write fops from parallel to serial in consequence.
>
> i.e. multiple io-worker threads carrying out IO write
fops are
> blocked in posix_writev to execute final write fop
pwrite/pwritev
in
> __posix_writev function ONE AFTER ANOTHER.
>
> For example:
>
> thread1: iot_worker -> ... -> posix_writev() |
> thread2: iot_worker -> ... -> posix_writev() |
> thread3: iot_worker -> ... -> posix_writev()
->
__posix_writev()
> thread4: iot_worker -> ... -> posix_writev() |
>
> there are 4 iot_worker thread doing the 128KB IO write
fops as
> above, but only one can execute __posix_writev function
and the
> others have to wait.
>
> however, if the afr volume is configured on with
storage.linux-aio
> which is off in default, the iot_worker will use
posix_aio_writev
> instead of posix_writev to write data.
> the posix_aio_writev function won't be affected by
> GLUSTERFS_WRITE_IS_APPEND, and the AFR volume write
performance goes
up.
> I think this is a bug :-(.
Yeah, I agree with you. I suppose the
GLUSTERFS_WRITE_IS_APPEND
is a misuse in afr_writev.
I checked the original intent of
GLUSTERS_WRITE_IS_APPEND
change at review website:
http://review.gluster.org/#/c/5501/
The initial purpose seems to avoid an
unnecessary
fsync()
in
afr_changelog_post_op_safe function if the
writing
data position
was currently at the end of the file, detected
by
(preop.ia_size == offset || (fd->flags &
O_APPEND))
in posix_writev.
In comparison
with the
afr write performance loss, I think
it costs too much.
I suggest to make the GLUSTERS_WRITE_IS_APPEND
setting configurable
just as ensure-durability in afr.
You are right, it doesn't make sense to put this option in
dictionary if ensure-durability is off.
http://review.gluster.org/13285 addresses this. Do you want to try
this out?
Thanks for doing most of the work :-). Do let me know if you want to
raise a bug for this. Or I can take that up if you don't have time.
Pranith
>
> So, my question is whether AFR volume could work fine
with
> storage.linux-aio configuration which bypass the
> GLUSTERFS_WRITE_IS_APPEND setting in afr_writev,
> and why glusterfs keeps posix_aio_writev different from
posix_writev
?
>
> Any replies to clear my confusion would be grateful, and
thanks in
advance.
> What is the workload you have? multiple
writers
on same file workloads?
I test the afr gluster volume by fio like this:
fio --filename=/mnt/afr/20G.dat --direct=1
--rw=write
--bs=128k --size=20G --numjobs=8
--runtime=60 --group_reporting --name=afr_test
--iodepth=1
--ioengine=libaio
The Glusterfs BRICKS are two IBM X3550 M3.
The local disk direct write performance of
128KB IO
req block size is about 18MB/s
in single thread and 80MB/s in 8 multi-threads.
If the GLUSTERS_WRITE_IS_APPEND is configed,
the afr
gluster volume write performance is 18MB/s
as the single thread, and if not, the
performance
is nearby 75MB/s.(network bandwith is enough)
>
> Pranith
>
>
> --------------------------------------------------------
> ZTE Information Security Notice: The information
contained in this
> mail (and any attachment transmitted herewith) is
privileged and
> confidential and is intended for the exclusive use of the
addressee
> (s). If you are not an intended recipient, any
disclosure,
> reproduction, distribution or other dissemination or use
of the
> information contained is strictly prohibited. If you
have received
> this mail in error, please delete it and notify us
immediately.
>
>
>
>
_______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-devel
--------------------------------------------------------
ZTE Information Security Notice: The information contained in this mail (and any attachment transmitted herewith) is privileged and confidential and is intended for the exclusive use of the addressee(s). If you are not an intended recipient, any disclosure, reproduction, distribution or other dissemination or use of the information contained is strictly prohibited. If you have received this mail in error, please delete it and notify us immediately.
|
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel