On 6/8/2015 5:55 PM, Brian Ericson wrote:
Am I misunderstanding cluster.read-subvolume/cluster.read-subvolume-index?
I have two regions, "A" and "B" with servers "a" and "b" in, respectfully,
each region. I have clients in both regions. Intra-region communication is
fast, but the pipe between the regions is terrible. I'd like to minimize
inter-region communication to as close to glusterfs write operations only
and have reads go to the server in the region the client is running in.
I have created a replica volume as:
gluster volume create gv0 replica 2 a:/data/brick1/gv0 b:/data/brick1/gv0
force
As a baseline, if I use scp to copy from the brick directly, I get -- for a
100M file -- times of about 6s if the client scps from the server in the
same region and anywhere from 3 to 5 minutes if I the client scps the
server in the other region.
I was under the impression (from something I read but can't now find) that
glusterfs automatically picks the fastest replica, but that has not been my
experience; glusterfs seems to generally prefer the server in the other
region over the "local" one, with times usually in excess of 4 minutes.
I've also tried having clients mount the volume using the "xlator" options
cluster.read-subvolume and cluster.read-subvolume-index, but neither seem
to have any impact. Here are sample mount commands to show what I'm
attempting:
mount -t glusterfs -o xlator-option=cluster.read-subvolume=gv0-client-<0 or
1> a:/gv0 /mnt/glusterfs
mount -t glusterfs -o xlator-option=cluster.read-subvolume-index=<0 or 1>
a:/gv0 /mnt/glusterfs
Am I misunderstanding how glusterfs works, particularly when trying to
"read locally"? Is it possible to configure glusterfs to use a local
replica (or the "fastest replica") for reads?
I am not a developer, nor intimately familiar with the insides of glusterfs,
but here is how I understand that glusterfs-fuse file reads work.
First, all replica bricks are read, to make sure they are consistent. (If
not, gluster tries to make them consistent before proceeding).
After consistency is established, then the actual read occurs from the brick
with the shortest response time. I don't know when or how the response time
is measured, but it seems to work for most people most of the time. (If the
client is on one of the brick hosts, it will almost always read from the
local brick.)
If the file reads involve a lot of small files, the consistency check may be
what is killing your response times, rather than the read of the file
itself. Over a fast LAN, the consistency checks can take many times the
actual read time of the file.
Hopefully others will chime in with more information, but if you can supply
more information about what you are reading, that will help too. Are you
reading entire files, or just reading in a lot of "snippets" or what?
Ted Miller
Elkhart, IN, USA
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users