Re: Fwd: disperse heal speed up

Serkan Çoban <cobanserkan@xxxxxxxxx> · Fri, 12 Aug 2016 16:06:11 +0300

I simulated a disk failure, there are 5 top level directories in
gluster and I executed the find command 5 parallel from the same
client.
24 hours passed and 500GB of 700GB data was healed. I think it will
complete in 36 hours. Before it was 144 hours.
What I would like to ask is, can I further increase parallel execution
of find by giving sub-folders to it?
Assume below directory structure:
/mnt/gluster/a/x1
/mnt/gluster/a/x2
/mnt/gluster/b/x1
/mnt/gluster/b/x2
/mnt/gluster/c/x1
/mnt/gluster/c/x2
/mnt/gluster/d/x1
/mnt/gluster/d/x2
/mnt/gluster/e/x1
/mnt/gluster/e/x2

Can I run 10 different find commands from 10 different clients to
speed up heal performance?

>From Client1:
find /mnt/gluster/a/x1 -d -exec getfattr -h -n trusted.ec.heal {} \;

>From Client2:
find /mnt/gluster/a/x2 -d -exec getfattr -h -n trusted.ec.heal {} \;
...
...

>From Client10:
find /mnt/gluster/e/x2 -d -exec getfattr -h -n trusted.ec.heal {} \;

Serkan

On Thu, Aug 11, 2016 at 11:49 AM, Serkan Çoban <cobanserkan@xxxxxxxxx> wrote:
> Heal completed but I will try this by simulating a disk fail in
> cluster and reply to you. Thanks for the help.
>
> On Thu, Aug 11, 2016 at 9:52 AM, Pranith Kumar Karampuri
> <pkarampu@xxxxxxxxxx> wrote:
>>
>>
>> On Fri, Aug 5, 2016 at 8:37 PM, Serkan Çoban <cobanserkan@xxxxxxxxx> wrote:
>>>
>>> Hi again,
>>>
>>> I am seeing the above situation in production environment now.
>>> One disk on one of my servers broken. I killed the brick process,
>>> replace the disk, mount it and then I do a gluster v start force.
>>>
>>> For a 24 hours period  after replacing disks I see below gluster v
>>> heal info count increased until 200.000
>>>
>>> gluster v heal v0 info | grep "Number of entries" | grep -v "Number of
>>> entries: 0"
>>> Number of entries: 205117
>>> Number of entries: 205231
>>> ...
>>> ...
>>> ...
>>>
>>> For about 72 hours It decreased to 40K, and it is going very slowly right
>>> now.
>>> What I am observing is very very slow heal speed. There is no errors
>>> in brick logs.
>>> There was 900GB data in broken disk and now I see 200GB healed after
>>> 96 hours after replacing disk.
>>> There are below warnings in glustershd.log but I think they are harmless.
>>>
>>> W [ec_combine.c:866:ec_combine_check] 0-v0-disperse-56: Mismatching
>>> xdata in answers of LOOKUP
>>> W [ec_common.c:116:ec_check_status] 0-v0-disperse-56: Operation failed
>>> on some subvolumes (up=FFFFF, mask=FFFFF, remaining=0, good=FFFF7,
>>> bad=8)
>>> W [ec_common.c:71:ec_heal_report] 0-v0-disperse-56: Heal failed
>>> [invalid argument]
>>>
>>> I tried turning on performance.client-io-threads but it did not
>>> changed anything.
>>> For 900GB data It will take nearly 8 days to heal. What can I do?
>>
>>
>> Sorry for the delay in response, do you still have this problem?
>> You can trigger heals using the following command:
>>
>> find <dir-you-are-interested> -d -exec getfattr -h -n trusted.ec.heal {} \;
>>
>> If you have 10 top level directories may be you can spawn 10 such processes.
>>
>>
>>>
>>>
>>> Serkan
>>>
>>>
>>>
>>> On Fri, Apr 15, 2016 at 1:28 PM, Serkan Çoban <cobanserkan@xxxxxxxxx>
>>> wrote:
>>> > 100TB is newly created files when brick is down.I rethink the
>>> > situation and realized that I reformatted all the bricks in case 1 so
>>> > write speed limit is 26*100MB/disk
>>> > In case 2 I just reformatted one brick so write speed limited to
>>> > 100MB/disk...I will repeat the tests using one brick in both cases
>>> > once with reformat, and once with just killing brick process...
>>> > Thanks for reply..
>>> >
>>> > On Fri, Apr 15, 2016 at 9:27 AM, Xavier Hernandez
>>> > <xhernandez@xxxxxxxxxx> wrote:
>>> >> Hi Serkan,
>>> >>
>>> >> sorry for the delay, I'm a bit busy lately.
>>> >>
>>> >> On 13/04/16 13:59, Serkan Çoban wrote:
>>> >>>
>>> >>> Hi Xavier,
>>> >>>
>>> >>> Can you help me about the below issue? How can I increase the disperse
>>> >>> heal speed?
>>> >>
>>> >>
>>> >> It seems weird. Is there any related message in the logs ?
>>> >>
>>> >> In this particular test, are the 100TB modified files or newly created
>>> >> files
>>> >> while the brick was down ?
>>> >>
>>> >> How many files have been modified ?
>>> >>
>>> >>> Also I would be grateful if you have detailed documentation about
>>> >>> disperse
>>> >>> heal,
>>> >>> why heal happens on disperse volume, how it is triggered? Which nodes
>>> >>> participate in heal process? Any client interaction?
>>> >>
>>> >>
>>> >> Heal process is basically the same used for replicate. There are two
>>> >> ways to
>>> >> trigger a self-heal:
>>> >>
>>> >> * when an inconsistency is detected, the client initiates a background
>>> >> self-heal of the inode
>>> >>
>>> >> * the self-heal daemon scans the lists of modified files created by the
>>> >> index xlator when a modification is made while some node is down. All
>>> >> these
>>> >> files are self-healed.
>>> >>
>>> >> Xavi
>>> >>
>>> >>
>>> >>>
>>> >>> Serkan
>>> >>>
>>> >>>
>>> >>> ---------- Forwarded message ----------
>>> >>> From: Serkan Çoban <cobanserkan@xxxxxxxxx>
>>> >>> Date: Fri, Apr 8, 2016 at 5:46 PM
>>> >>> Subject: disperse heal speed up
>>> >>> To: Gluster Users <gluster-users@xxxxxxxxxxx>
>>> >>>
>>> >>>
>>> >>> Hi,
>>> >>>
>>> >>> I am testing heal speed of disperse volume and what I see is 5-10MB/s
>>> >>> per
>>> >>> node.
>>> >>> I increased disperse.background-heals to 32 and
>>> >>> disperse.heal-wait-qlength to 256, but still no difference.
>>> >>> One thing I noticed is that, when I kill a brick process, reformat it
>>> >>> and restart it heal speed is nearly 20x (200MB/s/node)
>>> >>>
>>> >>> But when I kill the brick, then write 100TB data, and start brick
>>> >>> afterwords heal is slow (5-10MB/s/node)
>>> >>>
>>> >>> What is the difference between two scenarios? Why one heal is slow and
>>> >>> other is fast? How can I increase disperse heal speed? Should I
>>> >>> increase thread count to 128 or 256? I am on 78x(16+4) disperse volume
>>> >>> and my servers are pretty strong (2x14 cores with 512GB ram, each node
>>> >>> has 26x8TB disks)
>>> >>>
>>> >>> Gluster version is 3.7.10.
>>> >>>
>>> >>> Thanks,
>>> >>> Serkan
>>> >>>
>>> >>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users@xxxxxxxxxxx
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>> --
>> Pranith
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users