Re: disperse volume file to subvolume mapping

Xavier Hernandez <xhernandez@xxxxxxxxxx> · Tue, 19 Apr 2016 10:22:19 +0200

Hi Serkan,

On 19/04/16 09:18, Serkan Çoban wrote:
Hi, I just reinstalled fresh 3.7.11 and I am seeing the same behavior.
50 clients copying part-0-xxxx named files using mapreduce to gluster
using one thread per server and they are using only 20 servers out of
60. On the other hand fio tests use all the servers. Anything I can do
to solve the issue?

Distribution of files to ec sets is done by dht. In theory if you create 
many files each ec set will receive the same amount of files. However 
when the number of files is small enough, statistics can fail.

Not sure what you are doing exactly, but a mapreduce procedure generally 
only creates a single output. In that case it makes sense that only one 
ec set is used. If you want to use all ec sets for a single file, you 
should enable sharding (I haven't tested that) or split the result in 
multiple files.

Xavi

Thanks,
Serkan

---------- Forwarded message ----------
From: Serkan Çoban <cobanserkan@xxxxxxxxx>
Date: Mon, Apr 18, 2016 at 2:39 PM
Subject: disperse volume file to subvolume mapping
To: Gluster Users <gluster-users@xxxxxxxxxxx>

Hi, I have a problem where clients are using only 1/3 of nodes in
disperse volume for writing.
I am testing from 50 clients using 1 to 10 threads with file names part-0-xxxx.
What I see is clients only use 20 nodes for writing. How is the file
name to sub volume hashing is done? Is this related to file names are
similar?

My cluster is 3.7.10 with 60 nodes each has 26 disks. Disperse volume
is 78 x (16+4). Only 26 out of 78 sub volumes used during writes..

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel