gluster rebalance taking three months

amar at gluster.com (Amar Tumballi) · Fri, 21 Oct 2011 12:42:39 +0530

Thanks for the logs.

This is due to another issue of 'gfid mismatches', which causes the some
locking deadlock in replicate, which will make each request to fix-layout
take 30mins (in your case i see frame timeout is set to very low value of
30, which is why you see it many times in 5mins), which explains the
slowness of whole operation.

Please plan to upgrade to 3.2.4 version, which has most of the fixes related
to gfid mismatch issues.

Regards,
Amar

On Fri, Oct 21, 2011 at 12:23 PM, Changliang Chen <hqucocl at gmail.com> wrote:

> Any help?
>
> We notice that if the below errors appears,the rebalance fixed layout will
> become very slow,the number just increase about 4 per five minutes.
>
> E [rpc-clnt.c:199:call_bail] dfs-client-0: bailing out frame type(GlusterFS
> 3.1) op(INODELK(29)) xid = 0x755696 sent = 2011-10-20 06:20:51.217782.
> timeout = 30
>
>  W [afr-self-heal-common.c:584:afr_sh_pending_to_delta]
> afr_sh_pending_to_delta: Unable to get dict value.
>
> I [dht-common.c:369:dht_revalidate_cbk] dfs-dht: subvolume
> 19loudfs-replicate-2 returned -1 (Invalid argument)
>
>
> On Tue, Oct 18, 2011 at 5:45 PM, Changliang Chen <hqucocl at gmail.com>wrote:
>
>> Thanks Amar,but it looks like that the v3.1.1 hasn't support the command
>>
>> 'gluster volume rebalance dfs migrate-data start'
>>
>> # gluster volume rebalance dfs migrate-data start
>> Usage: volume rebalance <VOLNAME> <start|stop|status>
>> Rebalance of Volume dfs failed
>>
>> On Tue, Oct 18, 2011 at 3:33 PM, Amar Tumballi <amar at gluster.com> wrote:
>>
>>> Hi Chen,
>>>
>>> Can you restart the 'glusterd' and run 'gluster volume rebalance dfs
>>> migrate-data start' and check if your data migration happens?
>>>
>>> Regards,
>>> Amar
>>>
>>>  On Tue, Oct 18, 2011 at 12:54 PM, Changliang Chen <hqucocl at gmail.com>wrote:
>>>
>>>> Hi guys,
>>>>
>>>>     we have a rebalance running on eight  bricks since  July and this is
>>>> what the status looks like right now:
>>>>
>>>> ===Tue Oct 18 13:45:01 CST 2011 ====
>>>> rebalance step 1: layout fix in progress: fixed layout 223623
>>>>
>>>> There are roughly 8T photos in the storage,so how long should this
>>>> rebalance take?
>>>>
>>>> What does the number (in this case) 22362 represent?
>>>>
>>>> Our gluster infomation:
>>>> Repository revision: v3.1.1
>>>> Volume Name: dfs
>>>> Type: Distributed-Replicate
>>>> Status: Started
>>>> Number of Bricks: 4 x 2 = 8
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: 10.1.1.23:/data0
>>>> Brick2: 10.1.1.24:/data0
>>>> Brick3: 10.1.1.25:/data0
>>>> Brick4: 10.1.1.26:/data0
>>>> Brick5: 10.1.1.27:/data0
>>>> Brick6: 10.1.1.28:/data0
>>>> Brick7: 10.1.1.64:/data0
>>>> Brick8: 10.1.1.65:/data0
>>>> Options Reconfigured:
>>>> cluster.min-free-disk: 10%
>>>> network.ping-timeout: 25
>>>> network.frame-timeout: 30
>>>> performance.cache-max-file-size: 512KB
>>>> performance.cache-size: 3GB
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Regards,
>>>>
>>>> Cocl
>>>> OM manager
>>>> 19lou Operation & Maintenance Dept
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Regards,
>>
>> Cocl
>> OM manager
>> 19lou Operation & Maintenance Dept
>>
>
>
>
> --
>
> Regards,
>
> Cocl
> OM manager
> 19lou Operation & Maintenance Dept
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gluster.org/pipermail/gluster-users/attachments/20111021/f41b8293/attachment.htm>