User-agent: Mozilla/5.0 (X11; Linux i686; rv:31.0) Gecko/20100101 Thunderbird/31.0
Hi Jaden,
Sorry, from your subject I misunderstood your setup.
In a pure distributed volume, each file goes to a brick and only
that brick. That unique brick is computed with the elastic hash
algorithm.
If you get near wire speed, 1Gbps or 120MBps, when writing several
files at once but only roughly half speed writing only one file,
maybe each brick limits write speed: one "green" SATA disk running
at 5400rpm reaches 75MBps maximum writing big files sequentially
(Enterprise SATA disk spinning at 7200rpm reaches around 115MBps).
Can you, please, explain which type of bricks do you have on each
server node?
I'll try to emulate your setup and test it.
Thank you!
El 04/09/14 a les 03:20, Jaden Liang ha escrit:
Hi Ramon,
I am running on gluster FUSE client.
I maybe not stat clearly my testing environment. Let me
explain. The volume is configured on 2 servers. There is no
replication at all, just distributed volume. So I don't think it
is the replicated data issue. Actually, we can reach 100MB/s
when writing mutiple files at the same time.
If you are using NFS Client write data goes to one of
nodes of replica pair and that node sends write replica
data to the other node. If you are using one switch for
client and server connections and one 1GbE port on each
device, data received in the first node is re-sended to
the other node simultaneously and, in theory, you may
reach speeds closer to 100MBps.
In case of gluster FUSE Client, write data goes
simultaneously to both server nodes using half bandwidth
for each of the client's 1GbE port because replica is done
by client side, that results on a writing speed around
50MBps(<60MBps).
I hope this helps.
El 03/09/14 a les 07:02, Jaden Liang ha escrit:
Hi all,
We did some more tests and analysis yesterday. It
looks like 50MB/s is the top theoretical speed in
replica 1 volume over 1Gbps network. GlusterFS write
128KB data once a block, then wait for return. The 128KB
data would cost about 1ms in 1Gbps network. And in the
server-side, it took about 800us to 1000us to write
128KB to the HDD and return. Plus some other 100us to
200us time elapsed. GlusterFS would take about 2ms-2.2ms
to finish a 128KB block data writing, which is about
50MB/s.
The question is that why don't glusterfs use pipeline
writting or reading to speed up this chatty process?
We are running a performance test in a replica 1
volume and find out the single file sequence writing
performance only get about 50MB/s in a 1Gbps
Ethernet. However, if we test multiple
files sequence writing, the writing performance can
go up to 120MB/s which is the top speed of network.
We also tried to use the stat xlator to find out
where is the bottleneck of single file write
performance. Here is the stat data:
Note that the test is write a 1GB single file
sequentially to a replica 1 volume through 1Gbps
Ethernet network.
On the client-side, we can see there are 8192
write requests totally. Every request will write
128KB data. Total eclipsed time is 21834371us, about
21 seconds. The mean time of request is 2665us,
about 2.6ms which means it could only serves about
380 requests in 1 seconds. Plus there are other time
consuming like statfs, lookup, but those are not
major reasons.
On the server-side, the mean time of request is
751us include write data to HDD disk. So we think
that is not the major reason.
And we also modify some codes to do the statistic
of system epoll elapsed time. It only took about
20us from enqueue data to finish sent-out.
Now we are heading to the rpc mechanism in
glusterfs. Still, we think this issue maybe
encountered in gluster-devel or gluster-users teams.
Therefor, any suggestions would be grateful. Or have
anyone know such issue?