Hi again, I am seeing the above situation in production environment now. One disk on one of my servers broken. I killed the brick process, replace the disk, mount it and then I do a gluster v start force. For a 24 hours period after replacing disks I see below gluster v heal info count increased until 200.000 gluster v heal v0 info | grep "Number of entries" | grep -v "Number of entries: 0" Number of entries: 205117 Number of entries: 205231 ... ... ... For about 72 hours It decreased to 40K, and it is going very slowly right now. What I am observing is very very slow heal speed. There is no errors in brick logs. There was 900GB data in broken disk and now I see 200GB healed after 96 hours after replacing disk. There are below warnings in glustershd.log but I think they are harmless. W [ec_combine.c:866:ec_combine_check] 0-v0-disperse-56: Mismatching xdata in answers of LOOKUP W [ec_common.c:116:ec_check_status] 0-v0-disperse-56: Operation failed on some subvolumes (up=FFFFF, mask=FFFFF, remaining=0, good=FFFF7, bad=8) W [ec_common.c:71:ec_heal_report] 0-v0-disperse-56: Heal failed [invalid argument] I tried turning on performance.client-io-threads but it did not changed anything. For 900GB data It will take nearly 8 days to heal. What can I do? Serkan On Fri, Apr 15, 2016 at 1:28 PM, Serkan Çoban <cobanserkan@xxxxxxxxx> wrote: > 100TB is newly created files when brick is down.I rethink the > situation and realized that I reformatted all the bricks in case 1 so > write speed limit is 26*100MB/disk > In case 2 I just reformatted one brick so write speed limited to > 100MB/disk...I will repeat the tests using one brick in both cases > once with reformat, and once with just killing brick process... > Thanks for reply.. > > On Fri, Apr 15, 2016 at 9:27 AM, Xavier Hernandez <xhernandez@xxxxxxxxxx> wrote: >> Hi Serkan, >> >> sorry for the delay, I'm a bit busy lately. >> >> On 13/04/16 13:59, Serkan Çoban wrote: >>> >>> Hi Xavier, >>> >>> Can you help me about the below issue? How can I increase the disperse >>> heal speed? >> >> >> It seems weird. Is there any related message in the logs ? >> >> In this particular test, are the 100TB modified files or newly created files >> while the brick was down ? >> >> How many files have been modified ? >> >>> Also I would be grateful if you have detailed documentation about disperse >>> heal, >>> why heal happens on disperse volume, how it is triggered? Which nodes >>> participate in heal process? Any client interaction? >> >> >> Heal process is basically the same used for replicate. There are two ways to >> trigger a self-heal: >> >> * when an inconsistency is detected, the client initiates a background >> self-heal of the inode >> >> * the self-heal daemon scans the lists of modified files created by the >> index xlator when a modification is made while some node is down. All these >> files are self-healed. >> >> Xavi >> >> >>> >>> Serkan >>> >>> >>> ---------- Forwarded message ---------- >>> From: Serkan Çoban <cobanserkan@xxxxxxxxx> >>> Date: Fri, Apr 8, 2016 at 5:46 PM >>> Subject: disperse heal speed up >>> To: Gluster Users <gluster-users@xxxxxxxxxxx> >>> >>> >>> Hi, >>> >>> I am testing heal speed of disperse volume and what I see is 5-10MB/s per >>> node. >>> I increased disperse.background-heals to 32 and >>> disperse.heal-wait-qlength to 256, but still no difference. >>> One thing I noticed is that, when I kill a brick process, reformat it >>> and restart it heal speed is nearly 20x (200MB/s/node) >>> >>> But when I kill the brick, then write 100TB data, and start brick >>> afterwords heal is slow (5-10MB/s/node) >>> >>> What is the difference between two scenarios? Why one heal is slow and >>> other is fast? How can I increase disperse heal speed? Should I >>> increase thread count to 128 or 256? I am on 78x(16+4) disperse volume >>> and my servers are pretty strong (2x14 cores with 512GB ram, each node >>> has 26x8TB disks) >>> >>> Gluster version is 3.7.10. >>> >>> Thanks, >>> Serkan >>> >> _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users