Re: reading from local replica?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/09/2015 09:21 AM, Ted Miller wrote:
On 6/8/2015 5:55 PM, Brian Ericson wrote:
Am I misunderstanding
cluster.read-subvolume/cluster.read-subvolume-index?

I have two regions, "A" and "B" with servers "a" and "b" in,
respectfully, each region.  I have clients in both regions.
Intra-region communication is fast, but the pipe between the regions
is terrible.  I'd like to minimize inter-region communication to as
close to glusterfs write operations only and have reads go to the
server in the region the client is running in.

I have created a replica volume as:
gluster volume create gv0 replica 2 a:/data/brick1/gv0
b:/data/brick1/gv0 force

As a baseline, if I use scp to copy from the brick directly, I get --
for a 100M file -- times of about 6s if the client scps from the
server in the same region and anywhere from 3 to 5 minutes if I the
client scps the server in the other region.

I was under the impression (from something I read but can't now find)
that glusterfs automatically picks the fastest replica, but that has
not been my experience; glusterfs seems to generally prefer the server
in the other region over the "local" one, with times usually in excess
of 4 minutes.

I've also tried having clients mount the volume using the "xlator"
options cluster.read-subvolume and cluster.read-subvolume-index, but
neither seem to have any impact.  Here are sample mount commands to
show what I'm attempting:

mount -t glusterfs -o
xlator-option=cluster.read-subvolume=gv0-client-<0 or 1> a:/gv0
/mnt/glusterfs
mount -t glusterfs -o xlator-option=cluster.read-subvolume-index=<0 or
1> a:/gv0 /mnt/glusterfs

Am I misunderstanding how glusterfs works, particularly when trying to
"read locally"?  Is it possible to configure glusterfs to use a local
replica (or the "fastest replica") for reads?
I am not a developer, nor intimately familiar with the insides of
glusterfs, but here is how I understand that glusterfs-fuse file reads
work.
First, all replica bricks are read, to make sure they are consistent.
(If not, gluster tries to make them consistent before proceeding).
After consistency is established, then the actual read occurs from the
brick with the shortest response time.  I don't know when or how the
response time is measured, but it seems to work for most people most of
the time.  (If the client is on one of the brick hosts, it will almost
always read from the local brick.)

If the file reads involve a lot of small files, the consistency check
may be what is killing your response times, rather than the read of the
file itself.  Over a fast LAN, the consistency checks can take many
times the actual read time of the file.

Hopefully others will chime in with more information, but if you can
supply more information about what you are reading, that will help too.
Are you reading entire files, or just reading in a lot of "snippets" or
what?

Ted Miller
Elkhart, IN, USA
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

Thanks for the response! Your understanding matches mine after reading documentation and various posts -- this should just work, right?

My test consists of reading a 100M file which has been replicated to both regions by glusterfs. The specific command looks similar to:
time /bin/cp -f /mnt/glusterfs/one_hundred_mb_file /tmp

To avoid local reads, I'm invoking the "cp" on separate hosts in each region. I umount & mount /mnt/glusterfs prior to running the timed to avoid measuring a read from the (client-)local cache. The direct-scp timings show that same-region reads could take under 10s and between-region reads will take minutes.

Almost universally, the first timed "cp" of a 100M file takes minutes. This is true for clients in both regions and regardless of how I mount the volume (with/without read-subvolume/read-subvolume-index). Occasionally, however (maybe once in every 20 first reads), glusterfs will surprise me and give times (reads of ~5-20s), which align with what I'd expect if it were going to a same-region glusterfs replica. I have never, however, seen this repeated: if a 100M file copies in under 20s and I immediately follow it up with a copy of another 100M file, the second file will always take many minutes.

It appears that cluster.read-subvolume and cluster.read-subvolume-index have no impact when passed as part of the client's mount command. I note that if I set this at the volume level (gluster volume set gv0 cluster.read-subvolume gv0-client-0), the impact is immediate: those lucky clients on the "right side" of the divide get fast times, while those on the "other side" get poor times. Again, however, I see no impact trying to override this as part of the mount command on the client.

So, maybe passing these options as a mount command doesn't work/is a no-op, but what I don't understand is why -- given that there is no measure by which glusterfs should ever conclude the replica in the "other" region is ever faster than the replica in the "same" region. In fact, it appears as though glusterfs is *preferring* the slower replica.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux