I am copying 10.000 files to gluster volume using mapreduce on clients. Each map process took one file at a time and copy it to gluster volume. My disperse volume consist of 78 subvolumes of 16+4 disk each. So If I copy >78 files parallel I expect each file goes to different subvolume right? In my tests during tests with fio I can see every file goes to different subvolume, but when I start mapreduce process from clients only 78/3=26 subvolumes used for writing files. I see that clearly from network traffic. Mapreduce on client side can be run multi thread. I tested with 1-5-10 threads on each client but every time only 26 subvolumes used. How can I debug the issue further? On Tue, Apr 19, 2016 at 11:22 AM, Xavier Hernandez <xhernandez@xxxxxxxxxx> wrote: > Hi Serkan, > > On 19/04/16 09:18, Serkan Çoban wrote: >> >> Hi, I just reinstalled fresh 3.7.11 and I am seeing the same behavior. >> 50 clients copying part-0-xxxx named files using mapreduce to gluster >> using one thread per server and they are using only 20 servers out of >> 60. On the other hand fio tests use all the servers. Anything I can do >> to solve the issue? > > > Distribution of files to ec sets is done by dht. In theory if you create > many files each ec set will receive the same amount of files. However when > the number of files is small enough, statistics can fail. > > Not sure what you are doing exactly, but a mapreduce procedure generally > only creates a single output. In that case it makes sense that only one ec > set is used. If you want to use all ec sets for a single file, you should > enable sharding (I haven't tested that) or split the result in multiple > files. > > Xavi > > >> >> Thanks, >> Serkan >> >> >> ---------- Forwarded message ---------- >> From: Serkan Çoban <cobanserkan@xxxxxxxxx> >> Date: Mon, Apr 18, 2016 at 2:39 PM >> Subject: disperse volume file to subvolume mapping >> To: Gluster Users <gluster-users@xxxxxxxxxxx> >> >> >> Hi, I have a problem where clients are using only 1/3 of nodes in >> disperse volume for writing. >> I am testing from 50 clients using 1 to 10 threads with file names >> part-0-xxxx. >> What I see is clients only use 20 nodes for writing. How is the file >> name to sub volume hashing is done? Is this related to file names are >> similar? >> >> My cluster is 3.7.10 with 60 nodes each has 26 disks. Disperse volume >> is 78 x (16+4). Only 26 out of 78 sub volumes used during writes.. >> > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel