Same result. Also checked the rebalance.log file, it has also no reference to part files... On Thu, Apr 21, 2016 at 3:34 PM, Xavier Hernandez <xhernandez@xxxxxxxxxx> wrote: > Can you try a 'gluster volume rebalance v0 start force' ? > > > On 21/04/16 14:23, Serkan Çoban wrote: >>> >>> Has the rebalance operation finished successfully ? has it skipped any >>> files ? >> >> Yes according to gluster v rebalance status it is completed without any >> errors. >> rebalance status report is like: >> Node Rebalanced files size Scanned >> failures skipped >> 1.1.1.185 158 29GB 1720 >> 0 314 >> 1.1.1.205 93 46.5GB 761 >> 0 95 >> 1.1.1.225 74 37GB 779 >> 0 94 >> >> >> All other hosts has 0 values. >> >> I double check that files with '---------T' attributes are there, >> maybe some of them deleted but I still see them in bricks... >> I am also concerned why part files not distributed to all 60 nodes? >> Rebalance should do that? >> >> On Thu, Apr 21, 2016 at 1:55 PM, Xavier Hernandez <xhernandez@xxxxxxxxxx> >> wrote: >>> >>> Hi Serkan, >>> >>> On 21/04/16 12:39, Serkan Çoban wrote: >>>> >>>> >>>> I started a gluster v rebalance v0 start command hoping that it will >>>> equally redistribute files across 60 nodes but it did not do that... >>>> why it did not redistribute files? any thoughts? >>> >>> >>> >>> Has the rebalance operation finished successfully ? has it skipped any >>> files >>> ? >>> >>> After a successful rebalance all files with attributes '---------T' >>> should >>> have disappeared. >>> >>> >>>> >>>> On Thu, Apr 21, 2016 at 11:24 AM, Xavier Hernandez >>>> <xhernandez@xxxxxxxxxx> wrote: >>>>> >>>>> >>>>> Hi Serkan, >>>>> >>>>> On 21/04/16 10:07, Serkan Çoban wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> I think the problem is in the temporary name that distcp gives to the >>>>>>> file while it's being copied before renaming it to the real name. Do >>>>>>> you >>>>>>> know what is the structure of this name ? >>>>>> >>>>>> >>>>>> >>>>>> Distcp temporary file name format is: >>>>>> ".distcp.tmp.attempt_1460381790773_0248_m_000001_0" and the same >>>>>> temporary file name used by one map process. For example I see in the >>>>>> logs that one map copies files part-m-00031,part-m-00047,part-m-00063 >>>>>> sequentially and they all use same temporary file name above. So no >>>>>> original file name appears in temporary file name. >>>>> >>>>> >>>>> >>>>> >>>>> This explains the problem. With the default options, DHT sends all >>>>> files >>>>> to >>>>> the subvolume that should store a file named 'distcp.tmp'. >>>>> >>>>> With this temporary name format, little can be done. >>>>> >>>>>> >>>>>> I will check if we can modify distcp behaviour, or we have to write >>>>>> our mapreduce procedures instead of using distcp. >>>>>> >>>>>>> 2. define the option 'extra-hash-regex' to an expression that matches >>>>>>> your temporary file names and returns the same name that will finally >>>>>>> have. >>>>>>> Depending on the differences between original and temporary file >>>>>>> names, >>>>>>> this >>>>>>> option could be useless. >>>>>>> 3. set the option 'rsync-hash-regex' to 'none'. This will prevent the >>>>>>> name conversion, so the files will be evenly distributed. However >>>>>>> this >>>>>>> will >>>>>>> cause a lot of files placed in incorrect subvolumes, creating a lot >>>>>>> of >>>>>>> link >>>>>>> files until a rebalance is executed. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> How can I set these options? >>>>> >>>>> >>>>> >>>>> >>>>> You can set gluster options using: >>>>> >>>>> gluster volume set <volname> <option> <value> >>>>> >>>>> for example: >>>>> >>>>> gluster volume set v0 rsync-hash-regex none >>>>> >>>>> Xavi >>>>> >>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Apr 21, 2016 at 10:00 AM, Xavier Hernandez >>>>>> <xhernandez@xxxxxxxxxx> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hi Serkan, >>>>>>> >>>>>>> I think the problem is in the temporary name that distcp gives to the >>>>>>> file >>>>>>> while it's being copied before renaming it to the real name. Do you >>>>>>> know >>>>>>> what is the structure of this name ? >>>>>>> >>>>>>> DHT selects the subvolume (in this case the ec set) on which the file >>>>>>> will >>>>>>> be stored based on the name of the file. This has a problem when a >>>>>>> file >>>>>>> is >>>>>>> being renamed, because this could change the subvolume where the file >>>>>>> should >>>>>>> be found. >>>>>>> >>>>>>> DHT has a feature to avoid incorrect file placements when executing >>>>>>> renames >>>>>>> for the rsync case. What it does is to check if the file matches the >>>>>>> following regular expression: >>>>>>> >>>>>>> ^\.(.+)\.[^.]+$ >>>>>>> >>>>>>> If a match is found, it only considers the part between parenthesis >>>>>>> to >>>>>>> calculate the destination subvolume. >>>>>>> >>>>>>> This is useful for rsync because temporary file names are constructed >>>>>>> in >>>>>>> the >>>>>>> following way: suppose the original filename is 'test'. The temporary >>>>>>> filename while rsync is being executed is made by prepending a dot >>>>>>> and >>>>>>> appending '.<random chars>': .test.712hd >>>>>>> >>>>>>> As you can see, the original name and the part of the name between >>>>>>> parenthesis that matches the regular expression are the same. This >>>>>>> causes >>>>>>> that, after renaming the temporary file to its original filename, >>>>>>> both >>>>>>> files >>>>>>> will be considered to belong to the same subvolume by DHT. >>>>>>> >>>>>>> In your case it's very probable that distcp uses a temporary name >>>>>>> like >>>>>>> '.part.<number>'. In this case the portion of the name used to select >>>>>>> the >>>>>>> subvolume is always 'part'. This would explain why all files go to >>>>>>> the >>>>>>> same >>>>>>> subvolume. Once the file is renamed to another name, DHT realizes >>>>>>> that >>>>>>> it >>>>>>> should go to another subvolume. At this point it creates a link file >>>>>>> (those >>>>>>> files with access rights = '---------T') in the correct subvolume but >>>>>>> it >>>>>>> doesn't move it. As you can see, this kind of files are better >>>>>>> balanced. >>>>>>> >>>>>>> To solve this problem you have three options: >>>>>>> >>>>>>> 1. change the temporary filename used by distcp to correctly match >>>>>>> the >>>>>>> regular expression. I'm not sure if this can be configured, but if >>>>>>> this >>>>>>> is >>>>>>> possible, this is the best option. >>>>>>> >>>>>>> 2. define the option 'extra-hash-regex' to an expression that matches >>>>>>> your >>>>>>> temporary file names and returns the same name that will finally >>>>>>> have. >>>>>>> Depending on the differences between original and temporary file >>>>>>> names, >>>>>>> this >>>>>>> option could be useless. >>>>>>> >>>>>>> 3. set the option 'rsync-hash-regex' to 'none'. This will prevent the >>>>>>> name >>>>>>> conversion, so the files will be evenly distributed. However this >>>>>>> will >>>>>>> cause >>>>>>> a lot of files placed in incorrect subvolumes, creating a lot of link >>>>>>> files >>>>>>> until a rebalance is executed. >>>>>>> >>>>>>> Xavi >>>>>>> >>>>>>> >>>>>>> On 20/04/16 14:13, Serkan Çoban wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Here is the steps that I do in detail and relevant output from >>>>>>>> bricks: >>>>>>>> >>>>>>>> I am using below command for volume creation: >>>>>>>> gluster volume create v0 disperse 20 redundancy 4 \ >>>>>>>> 1.1.1.{185..204}:/bricks/02 \ >>>>>>>> 1.1.1.{205..224}:/bricks/02 \ >>>>>>>> 1.1.1.{225..244}:/bricks/02 \ >>>>>>>> 1.1.1.{185..204}:/bricks/03 \ >>>>>>>> 1.1.1.{205..224}:/bricks/03 \ >>>>>>>> 1.1.1.{225..244}:/bricks/03 \ >>>>>>>> 1.1.1.{185..204}:/bricks/04 \ >>>>>>>> 1.1.1.{205..224}:/bricks/04 \ >>>>>>>> 1.1.1.{225..244}:/bricks/04 \ >>>>>>>> 1.1.1.{185..204}:/bricks/05 \ >>>>>>>> 1.1.1.{205..224}:/bricks/05 \ >>>>>>>> 1.1.1.{225..244}:/bricks/05 \ >>>>>>>> 1.1.1.{185..204}:/bricks/06 \ >>>>>>>> 1.1.1.{205..224}:/bricks/06 \ >>>>>>>> 1.1.1.{225..244}:/bricks/06 \ >>>>>>>> 1.1.1.{185..204}:/bricks/07 \ >>>>>>>> 1.1.1.{205..224}:/bricks/07 \ >>>>>>>> 1.1.1.{225..244}:/bricks/07 \ >>>>>>>> 1.1.1.{185..204}:/bricks/08 \ >>>>>>>> 1.1.1.{205..224}:/bricks/08 \ >>>>>>>> 1.1.1.{225..244}:/bricks/08 \ >>>>>>>> 1.1.1.{185..204}:/bricks/09 \ >>>>>>>> 1.1.1.{205..224}:/bricks/09 \ >>>>>>>> 1.1.1.{225..244}:/bricks/09 \ >>>>>>>> 1.1.1.{185..204}:/bricks/10 \ >>>>>>>> 1.1.1.{205..224}:/bricks/10 \ >>>>>>>> 1.1.1.{225..244}:/bricks/10 \ >>>>>>>> 1.1.1.{185..204}:/bricks/11 \ >>>>>>>> 1.1.1.{205..224}:/bricks/11 \ >>>>>>>> 1.1.1.{225..244}:/bricks/11 \ >>>>>>>> 1.1.1.{185..204}:/bricks/12 \ >>>>>>>> 1.1.1.{205..224}:/bricks/12 \ >>>>>>>> 1.1.1.{225..244}:/bricks/12 \ >>>>>>>> 1.1.1.{185..204}:/bricks/13 \ >>>>>>>> 1.1.1.{205..224}:/bricks/13 \ >>>>>>>> 1.1.1.{225..244}:/bricks/13 \ >>>>>>>> 1.1.1.{185..204}:/bricks/14 \ >>>>>>>> 1.1.1.{205..224}:/bricks/14 \ >>>>>>>> 1.1.1.{225..244}:/bricks/14 \ >>>>>>>> 1.1.1.{185..204}:/bricks/15 \ >>>>>>>> 1.1.1.{205..224}:/bricks/15 \ >>>>>>>> 1.1.1.{225..244}:/bricks/15 \ >>>>>>>> 1.1.1.{185..204}:/bricks/16 \ >>>>>>>> 1.1.1.{205..224}:/bricks/16 \ >>>>>>>> 1.1.1.{225..244}:/bricks/16 \ >>>>>>>> 1.1.1.{185..204}:/bricks/17 \ >>>>>>>> 1.1.1.{205..224}:/bricks/17 \ >>>>>>>> 1.1.1.{225..244}:/bricks/17 \ >>>>>>>> 1.1.1.{185..204}:/bricks/18 \ >>>>>>>> 1.1.1.{205..224}:/bricks/18 \ >>>>>>>> 1.1.1.{225..244}:/bricks/18 \ >>>>>>>> 1.1.1.{185..204}:/bricks/19 \ >>>>>>>> 1.1.1.{205..224}:/bricks/19 \ >>>>>>>> 1.1.1.{225..244}:/bricks/19 \ >>>>>>>> 1.1.1.{185..204}:/bricks/20 \ >>>>>>>> 1.1.1.{205..224}:/bricks/20 \ >>>>>>>> 1.1.1.{225..244}:/bricks/20 \ >>>>>>>> 1.1.1.{185..204}:/bricks/21 \ >>>>>>>> 1.1.1.{205..224}:/bricks/21 \ >>>>>>>> 1.1.1.{225..244}:/bricks/21 \ >>>>>>>> 1.1.1.{185..204}:/bricks/22 \ >>>>>>>> 1.1.1.{205..224}:/bricks/22 \ >>>>>>>> 1.1.1.{225..244}:/bricks/22 \ >>>>>>>> 1.1.1.{185..204}:/bricks/23 \ >>>>>>>> 1.1.1.{205..224}:/bricks/23 \ >>>>>>>> 1.1.1.{225..244}:/bricks/23 \ >>>>>>>> 1.1.1.{185..204}:/bricks/24 \ >>>>>>>> 1.1.1.{205..224}:/bricks/24 \ >>>>>>>> 1.1.1.{225..244}:/bricks/24 \ >>>>>>>> 1.1.1.{185..204}:/bricks/25 \ >>>>>>>> 1.1.1.{205..224}:/bricks/25 \ >>>>>>>> 1.1.1.{225..244}:/bricks/25 \ >>>>>>>> 1.1.1.{185..204}:/bricks/26 \ >>>>>>>> 1.1.1.{205..224}:/bricks/26 \ >>>>>>>> 1.1.1.{225..244}:/bricks/26 \ >>>>>>>> 1.1.1.{185..204}:/bricks/27 \ >>>>>>>> 1.1.1.{205..224}:/bricks/27 \ >>>>>>>> 1.1.1.{225..244}:/bricks/27 force >>>>>>>> >>>>>>>> then I mount volume on 50 clients: >>>>>>>> mount -t glusterfs 1.1.1.185:/v0 /mnt/gluster >>>>>>>> >>>>>>>> then I make a directory from one of the clients and chmod it. >>>>>>>> mkdir /mnt/gluster/s1 && chmod 777 /mnt/gluster/s1 >>>>>>>> >>>>>>>> then I start distcp on clients, there are 1059X8.8GB files in one >>>>>>>> folder >>>>>>>> and >>>>>>>> they will be copied to /mnt/gluster/s1 with 100 parallel which means >>>>>>>> 2 >>>>>>>> copy jobs per client at same time. >>>>>>>> hadoop distcp -m 100 http://nn1:8020/path/to/teragen-10tb >>>>>>>> file:///mnt/gluster/s1 >>>>>>>> >>>>>>>> After job finished here is the status of s1 directory from bricks: >>>>>>>> s1 directory is present in all 1560 brick. >>>>>>>> s1/teragen-10tb folder is present in all 1560 brick. >>>>>>>> >>>>>>>> full listing of files in bricks: >>>>>>>> https://www.dropbox.com/s/rbgdxmrtwz8oya8/teragen_list.zip?dl=0 >>>>>>>> >>>>>>>> You can ignore the .crc files in the brick output above, they are >>>>>>>> checksum files... >>>>>>>> >>>>>>>> As you can see part-m-xxxx files written only some bricks in nodes >>>>>>>> 0205..0224 >>>>>>>> All bricks have some files but they have zero size. >>>>>>>> >>>>>>>> I increase file descriptors to 65k so it is not the issue... >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Apr 20, 2016 at 9:34 AM, Xavier Hernandez >>>>>>>> <xhernandez@xxxxxxxxxx> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Serkan, >>>>>>>>> >>>>>>>>> On 19/04/16 15:16, Serkan Çoban wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I assume that gluster is used to store the intermediate files >>>>>>>>>>>>> before >>>>>>>>>>>>> the reduce phase >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Nope, gluster is the destination for distcp command. hadoop distcp >>>>>>>>>> -m >>>>>>>>>> 50 http://nn1:8020/path/to/folder file:///mnt/gluster >>>>>>>>>> This run maps on datanodes which have /mnt/gluster mounted on all >>>>>>>>>> of >>>>>>>>>> them. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I don't know hadoop, so I'm of little help here. However it seems >>>>>>>>> that >>>>>>>>> -m >>>>>>>>> 50 >>>>>>>>> means to execute 50 copies in parallel. This means that even if the >>>>>>>>> distribution worked fine, at most 50 (much probably less) of the 78 >>>>>>>>> ec >>>>>>>>> sets >>>>>>>>> would be used in parallel. >>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> This means that this is caused by some peculiarity of the >>>>>>>>>>>>> mapreduce. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yes but how a client write 500 files to gluster mount and those >>>>>>>>>> file >>>>>>>>>> just written only to subset of subvolumes? I cannot use gluster as >>>>>>>>>> a >>>>>>>>>> backup cluster if I cannot write with distcp. >>>>>>>>>> >>>>>>>>> >>>>>>>>> All 500 files were created only on one of the 78 ec sets and the >>>>>>>>> remaining >>>>>>>>> 77 got empty ? >>>>>>>>> >>>>>>>>>>>>> You should look which files are created in each brick and how >>>>>>>>>>>>> many >>>>>>>>>>>>> while the process is running. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Files only created on nodes 185..204 or 205..224 or 225..244. Only >>>>>>>>>> on >>>>>>>>>> 20 nodes in each test. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> How many files there were in each brick ? >>>>>>>>> >>>>>>>>> Not sure if this can be related, but standard linux distributions >>>>>>>>> have >>>>>>>>> a >>>>>>>>> default limit of 1024 open file descriptors. Having a so big volume >>>>>>>>> and >>>>>>>>> doing a massive copy, maybe this limit is affecting something ? >>>>>>>>> >>>>>>>>> Are there any error or warning messages in the mount or bricks logs >>>>>>>>> ? >>>>>>>>> >>>>>>>>> >>>>>>>>> Xavi >>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Apr 19, 2016 at 1:05 PM, Xavier Hernandez >>>>>>>>>> <xhernandez@xxxxxxxxxx> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Serkan, >>>>>>>>>>> >>>>>>>>>>> moved to gluster-users since this doesn't belong to devel list. >>>>>>>>>>> >>>>>>>>>>> On 19/04/16 11:24, Serkan Çoban wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I am copying 10.000 files to gluster volume using mapreduce on >>>>>>>>>>>> clients. Each map process took one file at a time and copy it to >>>>>>>>>>>> gluster volume. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I assume that gluster is used to store the intermediate files >>>>>>>>>>> before >>>>>>>>>>> the >>>>>>>>>>> reduce phase. >>>>>>>>>>> >>>>>>>>>>>> My disperse volume consist of 78 subvolumes of 16+4 disk each. >>>>>>>>>>>> So >>>>>>>>>>>> If >>>>>>>>>>>> I >>>>>>>>>>>> copy >78 files parallel I expect each file goes to different >>>>>>>>>>>> subvolume >>>>>>>>>>>> right? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> If you only copy 78 files, most probably you will get some >>>>>>>>>>> subvolume >>>>>>>>>>> empty >>>>>>>>>>> and some other with more than one or two files. It's not an exact >>>>>>>>>>> distribution, it's a statistially balanced distribution: over >>>>>>>>>>> time >>>>>>>>>>> and >>>>>>>>>>> with >>>>>>>>>>> enough files, each brick will contain an amount of files in the >>>>>>>>>>> same >>>>>>>>>>> order >>>>>>>>>>> of magnitude, but they won't have the *same* number of files. >>>>>>>>>>> >>>>>>>>>>>> In my tests during tests with fio I can see every file goes to >>>>>>>>>>>> different subvolume, but when I start mapreduce process from >>>>>>>>>>>> clients >>>>>>>>>>>> only 78/3=26 subvolumes used for writing files. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This means that this is caused by some peculiarity of the >>>>>>>>>>> mapreduce. >>>>>>>>>>> >>>>>>>>>>>> I see that clearly from network traffic. Mapreduce on client >>>>>>>>>>>> side >>>>>>>>>>>> can >>>>>>>>>>>> be run multi thread. I tested with 1-5-10 threads on each client >>>>>>>>>>>> but >>>>>>>>>>>> every time only 26 subvolumes used. >>>>>>>>>>>> How can I debug the issue further? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> You should look which files are created in each brick and how >>>>>>>>>>> many >>>>>>>>>>> while >>>>>>>>>>> the >>>>>>>>>>> process is running. >>>>>>>>>>> >>>>>>>>>>> Xavi >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Apr 19, 2016 at 11:22 AM, Xavier Hernandez >>>>>>>>>>>> <xhernandez@xxxxxxxxxx> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Serkan, >>>>>>>>>>>>> >>>>>>>>>>>>> On 19/04/16 09:18, Serkan Çoban wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, I just reinstalled fresh 3.7.11 and I am seeing the same >>>>>>>>>>>>>> behavior. >>>>>>>>>>>>>> 50 clients copying part-0-xxxx named files using mapreduce to >>>>>>>>>>>>>> gluster >>>>>>>>>>>>>> using one thread per server and they are using only 20 servers >>>>>>>>>>>>>> out >>>>>>>>>>>>>> of >>>>>>>>>>>>>> 60. On the other hand fio tests use all the servers. Anything >>>>>>>>>>>>>> I >>>>>>>>>>>>>> can >>>>>>>>>>>>>> do >>>>>>>>>>>>>> to solve the issue? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Distribution of files to ec sets is done by dht. In theory if >>>>>>>>>>>>> you >>>>>>>>>>>>> create >>>>>>>>>>>>> many files each ec set will receive the same amount of files. >>>>>>>>>>>>> However >>>>>>>>>>>>> when >>>>>>>>>>>>> the number of files is small enough, statistics can fail. >>>>>>>>>>>>> >>>>>>>>>>>>> Not sure what you are doing exactly, but a mapreduce procedure >>>>>>>>>>>>> generally >>>>>>>>>>>>> only creates a single output. In that case it makes sense that >>>>>>>>>>>>> only >>>>>>>>>>>>> one >>>>>>>>>>>>> ec >>>>>>>>>>>>> set is used. If you want to use all ec sets for a single file, >>>>>>>>>>>>> you >>>>>>>>>>>>> should >>>>>>>>>>>>> enable sharding (I haven't tested that) or split the result in >>>>>>>>>>>>> multiple >>>>>>>>>>>>> files. >>>>>>>>>>>>> >>>>>>>>>>>>> Xavi >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Serkan >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ---------- Forwarded message ---------- >>>>>>>>>>>>>> From: Serkan Çoban <cobanserkan@xxxxxxxxx> >>>>>>>>>>>>>> Date: Mon, Apr 18, 2016 at 2:39 PM >>>>>>>>>>>>>> Subject: disperse volume file to subvolume mapping >>>>>>>>>>>>>> To: Gluster Users <gluster-users@xxxxxxxxxxx> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, I have a problem where clients are using only 1/3 of nodes >>>>>>>>>>>>>> in >>>>>>>>>>>>>> disperse volume for writing. >>>>>>>>>>>>>> I am testing from 50 clients using 1 to 10 threads with file >>>>>>>>>>>>>> names >>>>>>>>>>>>>> part-0-xxxx. >>>>>>>>>>>>>> What I see is clients only use 20 nodes for writing. How is >>>>>>>>>>>>>> the >>>>>>>>>>>>>> file >>>>>>>>>>>>>> name to sub volume hashing is done? Is this related to file >>>>>>>>>>>>>> names >>>>>>>>>>>>>> are >>>>>>>>>>>>>> similar? >>>>>>>>>>>>>> >>>>>>>>>>>>>> My cluster is 3.7.10 with 60 nodes each has 26 disks. Disperse >>>>>>>>>>>>>> volume >>>>>>>>>>>>>> is 78 x (16+4). Only 26 out of 78 sub volumes used during >>>>>>>>>>>>>> writes.. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users