glusterfs missing files on ls

stefano.sinigardi at gmail.com (Stefano Sinigardi) · Sun, 2 Jun 2013 15:05:44 +0900

Dear Vijay,
the filesystem is ext4, on a GPT structured disk, formatted by Ubuntu 12.10.
The rebalance I did was with the command

gluster volume rebalance data start

but in the log it got stuck on a file that I cannot remember (was a small
working .cpp file, saying that it was going to be moved to an much more
occupied replica, and it repeated this message until writing a log that was
a few GB).
Then I stopped it and restarted with

gluster volume rebalance data start force

in order to get rid of this problems about files going to bricks already
highly occupied.
Because I was almost stuck, remembering that a rebalance solved another
problem I had as a miracle, I retried it, but got stuck in a .dropbox-cache
folder. That is not a very important folder, so I thought I could remove
it. I launched a script to find all the files looking at all the bricks but
removing them from the fuse mountpoint. I don't know what went wrong (the
script is very simple, the problem maybe was that it was 4 am in the night)
but the fact is that files got removed calling rm at the bricks
mountpoints, not the fuse one. So I think that now I'm in a even worse
situation that before. I just stopped working on it, asking for some time
from my colleagues (at least data is still there, on the bricks, just
sparse on all of them) in order to think well about how to proceed (maybe
destroying it and rebuilding it, but it will be very time consuming as I
don't have so much free space elsewere to save everything, also it's very
difficult to save from the fuse mountpoint as it's not listing all the
files)

Thanks a lot for your support.
In any case, I'm learning really a lot.

    Stefano

On Sun, Jun 2, 2013 at 2:52 PM, Vijay Bellur <vbellur at redhat.com> wrote:

> On 05/31/2013 03:18 PM, Stefano Sinigardi wrote:
>
>> Dear Xavier,
>> I realized that the volume was not build properly when doing the first
>> analyses suggested by Davide, but I'm sure that this is not the problem
>> and so I quickly dismissed it. Also, we need a replica but not so
>> strictly, maybe in the future with the next volume I'll build it
>> properly. Anyway, yes, the volume got birth on "pedrillo" with a
>> replica-2 and the next day was expanded on "osmino", again with
>> replica-2, just by adding bricks and doing a rebalance, that was just
>> tried. I'm saying "tried" because it got "stuck", consuming a lot of RAM
>> (almost all, 16 GB), and it was counting million of files that I think
>> don't even exist on the volume, so I stopped it. Do you think that it
>> might be worth restarting?
>>
>
> I might have missed this detail in the thread. What is the disk filesystem
> on the bricks?
>
> Can you list the exact rebalance command that was triggered?
>
>
> Thanks,
> Vijay
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130602/3d90df8b/attachment.html>