I simulated a disk failure, there are 5 top level directories in gluster and I executed the find command 5 parallel from the same client. 24 hours passed and 500GB of 700GB data was healed. I think it will complete in 36 hours. Before it was 144 hours. What I would like to ask is, can I further increase parallel execution of find by giving sub-folders to it? Assume below directory structure: /mnt/gluster/a/x1 /mnt/gluster/a/x2 /mnt/gluster/b/x1 /mnt/gluster/b/x2 /mnt/gluster/c/x1 /mnt/gluster/c/x2 /mnt/gluster/d/x1 /mnt/gluster/d/x2 /mnt/gluster/e/x1 /mnt/gluster/e/x2 Can I run 10 different find commands from 10 different clients to speed up heal performance? >From Client1: find /mnt/gluster/a/x1 -d -exec getfattr -h -n trusted.ec.heal {} \; >From Client2: find /mnt/gluster/a/x2 -d -exec getfattr -h -n trusted.ec.heal {} \; ... ... >From Client10: find /mnt/gluster/e/x2 -d -exec getfattr -h -n trusted.ec.heal {} \; Serkan On Thu, Aug 11, 2016 at 11:49 AM, Serkan Çoban <cobanserkan@xxxxxxxxx> wrote: > Heal completed but I will try this by simulating a disk fail in > cluster and reply to you. Thanks for the help. > > On Thu, Aug 11, 2016 at 9:52 AM, Pranith Kumar Karampuri > <pkarampu@xxxxxxxxxx> wrote: >> >> >> On Fri, Aug 5, 2016 at 8:37 PM, Serkan Çoban <cobanserkan@xxxxxxxxx> wrote: >>> >>> Hi again, >>> >>> I am seeing the above situation in production environment now. >>> One disk on one of my servers broken. I killed the brick process, >>> replace the disk, mount it and then I do a gluster v start force. >>> >>> For a 24 hours period after replacing disks I see below gluster v >>> heal info count increased until 200.000 >>> >>> gluster v heal v0 info | grep "Number of entries" | grep -v "Number of >>> entries: 0" >>> Number of entries: 205117 >>> Number of entries: 205231 >>> ... >>> ... >>> ... >>> >>> For about 72 hours It decreased to 40K, and it is going very slowly right >>> now. >>> What I am observing is very very slow heal speed. There is no errors >>> in brick logs. >>> There was 900GB data in broken disk and now I see 200GB healed after >>> 96 hours after replacing disk. >>> There are below warnings in glustershd.log but I think they are harmless. >>> >>> W [ec_combine.c:866:ec_combine_check] 0-v0-disperse-56: Mismatching >>> xdata in answers of LOOKUP >>> W [ec_common.c:116:ec_check_status] 0-v0-disperse-56: Operation failed >>> on some subvolumes (up=FFFFF, mask=FFFFF, remaining=0, good=FFFF7, >>> bad=8) >>> W [ec_common.c:71:ec_heal_report] 0-v0-disperse-56: Heal failed >>> [invalid argument] >>> >>> I tried turning on performance.client-io-threads but it did not >>> changed anything. >>> For 900GB data It will take nearly 8 days to heal. What can I do? >> >> >> Sorry for the delay in response, do you still have this problem? >> You can trigger heals using the following command: >> >> find <dir-you-are-interested> -d -exec getfattr -h -n trusted.ec.heal {} \; >> >> If you have 10 top level directories may be you can spawn 10 such processes. >> >> >>> >>> >>> Serkan >>> >>> >>> >>> On Fri, Apr 15, 2016 at 1:28 PM, Serkan Çoban <cobanserkan@xxxxxxxxx> >>> wrote: >>> > 100TB is newly created files when brick is down.I rethink the >>> > situation and realized that I reformatted all the bricks in case 1 so >>> > write speed limit is 26*100MB/disk >>> > In case 2 I just reformatted one brick so write speed limited to >>> > 100MB/disk...I will repeat the tests using one brick in both cases >>> > once with reformat, and once with just killing brick process... >>> > Thanks for reply.. >>> > >>> > On Fri, Apr 15, 2016 at 9:27 AM, Xavier Hernandez >>> > <xhernandez@xxxxxxxxxx> wrote: >>> >> Hi Serkan, >>> >> >>> >> sorry for the delay, I'm a bit busy lately. >>> >> >>> >> On 13/04/16 13:59, Serkan Çoban wrote: >>> >>> >>> >>> Hi Xavier, >>> >>> >>> >>> Can you help me about the below issue? How can I increase the disperse >>> >>> heal speed? >>> >> >>> >> >>> >> It seems weird. Is there any related message in the logs ? >>> >> >>> >> In this particular test, are the 100TB modified files or newly created >>> >> files >>> >> while the brick was down ? >>> >> >>> >> How many files have been modified ? >>> >> >>> >>> Also I would be grateful if you have detailed documentation about >>> >>> disperse >>> >>> heal, >>> >>> why heal happens on disperse volume, how it is triggered? Which nodes >>> >>> participate in heal process? Any client interaction? >>> >> >>> >> >>> >> Heal process is basically the same used for replicate. There are two >>> >> ways to >>> >> trigger a self-heal: >>> >> >>> >> * when an inconsistency is detected, the client initiates a background >>> >> self-heal of the inode >>> >> >>> >> * the self-heal daemon scans the lists of modified files created by the >>> >> index xlator when a modification is made while some node is down. All >>> >> these >>> >> files are self-healed. >>> >> >>> >> Xavi >>> >> >>> >> >>> >>> >>> >>> Serkan >>> >>> >>> >>> >>> >>> ---------- Forwarded message ---------- >>> >>> From: Serkan Çoban <cobanserkan@xxxxxxxxx> >>> >>> Date: Fri, Apr 8, 2016 at 5:46 PM >>> >>> Subject: disperse heal speed up >>> >>> To: Gluster Users <gluster-users@xxxxxxxxxxx> >>> >>> >>> >>> >>> >>> Hi, >>> >>> >>> >>> I am testing heal speed of disperse volume and what I see is 5-10MB/s >>> >>> per >>> >>> node. >>> >>> I increased disperse.background-heals to 32 and >>> >>> disperse.heal-wait-qlength to 256, but still no difference. >>> >>> One thing I noticed is that, when I kill a brick process, reformat it >>> >>> and restart it heal speed is nearly 20x (200MB/s/node) >>> >>> >>> >>> But when I kill the brick, then write 100TB data, and start brick >>> >>> afterwords heal is slow (5-10MB/s/node) >>> >>> >>> >>> What is the difference between two scenarios? Why one heal is slow and >>> >>> other is fast? How can I increase disperse heal speed? Should I >>> >>> increase thread count to 128 or 256? I am on 78x(16+4) disperse volume >>> >>> and my servers are pretty strong (2x14 cores with 512GB ram, each node >>> >>> has 26x8TB disks) >>> >>> >>> >>> Gluster version is 3.7.10. >>> >>> >>> >>> Thanks, >>> >>> Serkan >>> >>> >>> >> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users@xxxxxxxxxxx >>> http://www.gluster.org/mailman/listinfo/gluster-users >> >> >> >> >> -- >> Pranith _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users