Data migration and rebalance

jonathan.lefman at essess.com (Jonathan Lefman) · Sat, 24 Nov 2012 19:42:34 -0500

My gluster volume is more-or-less useless from an administration point of
view.  I am unable to stop the volume because it claims it is rebalancing
or gluster says the command failed.  When I try to stop, start, or get the
status of rebalancing, I get nothing returned.  I have stopped and
restarted all glusterfsd processes on each host.  Nothing seems to bring
sanity back to the volume.

This is bad news for gluster's reliability.  I am unable to find a source
of the problem.  Regular methods for resetting the system to usable state
are not working.  I think it is time to call it quits and find another
solution.  Ceph?

On Fri, Nov 23, 2012 at 1:21 PM, Jonathan Lefman <jonathan.lefman at essess.com
> wrote:

> At the same time, when looking at the rebalance log, it appears that the
> rebalance is still going on in the background because I am seeing entries
> related to rebalancing.  However, the detail status command shows that the
> distribution for files is still stable on the older nodes.
>
>
>
> On Fri, Nov 23, 2012 at 1:10 PM, Jonathan Lefman <
> jonathan.lefman at essess.com> wrote:
>
>> Volume type:
>>
>> non-replicated, 29 nodes, xfs formats
>>
>> Number of files/directories:
>>
>> There are about 5000-10000 directories
>>
>> Average size of files:
>>
>> There are two distributions of files:  a vast majority of files is around
>> 200-300 kilobytes, with about 1000-fold fewer files with a size around 1
>> gigabyte
>>
>> Average number of files per directory:
>>
>> Around 1800 files per directory
>>
>> glusterd log below:
>>
>> When trying
>>
>> sudo gluster volume rebalance essess_data status
>>
>> OR
>>
>> sudo gluster volume status myvol
>> operation failed
>>
>> Log for this time from /var/log/glusterfs/etc-glusterfs-glusterd.vol.log:
>>
>>  [2012-11-23 13:05:00.489567] E
>> [glusterd-handler.c:458:glusterd_op_txn_begin] 0-management: Unable to
>> acquire local lock, ret: -1
>> [2012-11-23 13:07:09.102007] I
>> [glusterd-handler.c:2670:glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume essess_data
>> [2012-11-23 13:07:09.102056] E [glusterd-utils.c:277:glusterd_lock]
>> 0-glusterd: Unable to get lock for uuid:
>> ee33fd05-135e-40e7-a157-3c1e0b9be073, lock held by:
>> ee33fd05-135e-40e7-a157-3c1e0b9be073
>> [2012-11-23 13:07:09.102073] E
>> [glusterd-handler.c:458:glusterd_op_txn_begin] 0-management: Unable to
>> acquire local lock, ret: -1
>>
>>
>>
>>
>> On Fri, Nov 23, 2012 at 12:58 PM, Vijay Bellur <vbellur at redhat.com>wrote:
>>
>>> On 11/23/2012 11:14 PM, Jonathan Lefman wrote:
>>>
>>>> The rebalance command has run for quite a while.  Now when I issue the
>>>> rebalance status command,
>>>>
>>>> sudo gluster volume rebalance myvol status
>>>>
>>>> I get nothing back; just a return to the command prompt.  Any ideas of
>>>> what is going on?
>>>>
>>>>
>>> A few questions:
>>>
>>> - What is your volume type?
>>> - How many files and directories do you have in your volume?
>>> - What is the average size of files?
>>> - What is the average number of files per directory?
>>> - Can you please share glusterd logs from the time when the command
>>> returns without displaying any output?
>>>
>>> Thanks,
>>> Vijay
>>>
>>>
>>
>>
>> --
>> *Jonathan Lefman, Ph.D.*
>> *?**ssess, Inc.*
>> 25 Thomson Place, Suite 460, Boston, MA 02210
>> o: 415-361-5488 x121 | e: jonathan.lefman at essess.com | *www.essess.com*
>>
>>
>
>
> --
> *Jonathan Lefman, Ph.D.*
> *?**ssess, Inc.*
> 25 Thomson Place, Suite 460, Boston, MA 02210
> o: 415-361-5488 x121 | e: jonathan.lefman at essess.com | *www.essess.com*
>
>

-- 
*Jonathan Lefman, Ph.D.*
*?**ssess, Inc.*
25 Thomson Place, Suite 460, Boston, MA 02210
o: 415-361-5488 x121 | e: jonathan.lefman at essess.com | *www.essess.com*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121124/87279a20/attachment.html>